DALL-E Broke My Budget: How I Set Up Free Unlimited FLUX Image Generation on Local Hardware
When ChatGPT image generation ate through all my credits, I built a free unlimited pipeline with ComfyUI and FLUX.1-dev running on a consumer GPU. Here's the full setup walkthrough.
The Problem: AI Image Credits Disappear Fast
Here's a fun story. I was generating images for my blog — headers, feature graphics, the works. DALL-E via ChatGPT is convenient, no question. Click a button, type a prompt, get a picture. Easy.
Then I checked my usage. Zero percent remaining. All those "just one more" generations at £0.032 each added up fast. When you're producing content regularly, those per-image costs become a real line item.
The worst part? You don't get a warning. You just hit a wall and suddenly you're either waiting for the monthly reset or paying more. Neither option works when you're on a publishing schedule.
The Solution: Free, Unlimited, Local
I already had FLUX.1-dev downloaded locally — it's been sitting in my HuggingFace cache, the full ~32GB diffusers model, eating disk space but working perfectly via Python scripts. The quality is genuinely better than what I was getting from DALL-E for tech-related images. But running it from a Python script means editing code every time you want to tweak something.
What I needed was a proper interface — something with a visual workflow where I could adjust prompts, change seeds, swap samplers, and iterate quickly without touching code.
Enter ComfyUI.
What is ComfyUI?
ComfyUI is a node-based interface for Stable Diffusion and FLUX models. Think of it like a visual programming environment for image generation. You connect nodes — one loads the model, one encodes your text prompt, one runs the diffusion steps, one decodes the result — and they form a pipeline you can see, tweak, and save as a reusable workflow.
Key advantages:
- Free and open source — no credits, no subscriptions, no usage limits
- Runs entirely locally — your images never leave your machine
- Full control over every parameter — seeds, samplers, schedulers, LoRA stacking, ControlNet, inpainting
- Reproducible — save a workflow, share it, run it again months later with the same result
- Extensible — massive community of custom nodes for every technique
The Hardware
This runs on my RTX 5070 Ti (16GB VRAM). FLUX.1-dev is a big model:
- ~22GB transformer (merged from 3 sharded safetensors files)
- ~9GB T5-XXL text encoder
- ~1GB CLIP-L text encoder
- ~300MB VAE
Yes, that's more than 16GB of model files. But ComfyUI loads components sequentially and uses smart memory management. The actual inference works within 16GB VRAM because:
- Text encoders are loaded, used, then can be offloaded
- The transformer runs in
bfloat16(halves memory vs float32) - ComfyUI's sequential offloading keeps peak VRAM manageable
In practice, I see about 13.8GB VRAM usage during generation, leaving headroom on the 16GB card.
Step-by-Step Setup
1. Clone ComfyUI
cd ~
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
2. Create a Virtual Environment
uv venv --python 3.12 .venv
source .venv/bin/activate
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
uv pip install -r requirements.txt
PyTorch with CUDA support is essential — the cu121 index gives you CUDA 12.1 compatibility. Adjust if your driver needs a different CUDA version.
3. Prepare the FLUX.1-dev Model Files
This is the tricky part. FLUX.1-dev from HuggingFace comes in sharded safetensors format — 3 files for the transformer, 2 for the T5 text encoder. ComfyUI expects single files.
If you already have the model in your HuggingFace cache (from using diffusers), you need to merge the shards. Here's a Python script that does the job:
from safetensors import safe_open
from safetensors.torch import save_file
from pathlib import Path
def merge_shards(shard_dir: Path, output_path: Path):
"""Merge multi-part safetensors into a single file."""
shards = sorted(shard_dir.glob("*-of-*.safetensors"))
all_tensors = {}
metadata = None
for i, shard_path in enumerate(shards):
print(f" Reading shard {i+1}/{len(shards)}: {shard_path.name}")
with safe_open(str(shard_path), framework="pt", device="cpu") as f:
if metadata is None:
metadata = f.metadata()
for key in f.keys():
all_tensors[key] = f.get_tensor(key)
output_path.parent.mkdir(parents=True, exist_ok=True)
save_file(all_tensors, str(output_path), metadata=metadata)
size_gb = output_path.stat().st_size / (1024**3)
print(f"Saved {output_path} ({size_gb:.1f} GB)")
Run this for the transformer shards and T5 text encoder shards, then place the merged files in ComfyUI's model directories:
| Component | Source | Destination |
|---|---|---|
| Transformer | transformer/ shards → merge | models/diffusion_models/flux1-dev.safetensors |
| T5-XXL | text_encoder_2/ shards → merge | models/text_encoders/t5xxl_fp16.safetensors |
| CLIP-L | text_encoder/model.safetensors → symlink | models/clip/clip_l.safetensors |
| VAE | vae/diffusion_pytorch_model.safetensors → symlink | models/vae/ae.safetensors |
Important: If you already have FLUX.1-dev in your HuggingFace cache (e.g. from using it with diffusers or the HuggingFace API), you already have these files. The shard merging step is the only "extra" work — and it's a one-time operation.
4. Install ComfyUI Manager (Recommended)
cd ~/ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
cd ~/ComfyUI && source .venv/bin/activate
uv pip install -r custom_nodes/ComfyUI-Manager/requirements.txt
The Manager gives you a browser UI for installing custom nodes, downloading models, and managing workflows. It's not required but makes life much easier.
5. Start ComfyUI
cd ~/ComfyUI
source .venv/bin/activate
PYTORCH_ALLOC_CONF=expandable_segments:True python main.py --listen 0.0.0.0 --port 8188
The PYTORCH_ALLOC_CONF=expandable_segments:True environment variable is critical for large models — it reduces VRAM fragmentation and prevents spurious out-of-memory errors.
6. The FLUX.1-dev Workflow
In the ComfyUI browser interface (http://localhost:8188), build this workflow:
- UNETLoader —
flux1-dev.safetensors, weight_dtype: default - DualCLIPLoader — clip_name1:
clip_l.safetensors, clip_name2:t5xxl_fp16.safetensors, type:flux - VAELoader —
ae.safetensors - ModelSamplingFlux — Connect the UNET model output, set max_shift: 1.15, base_shift: 0.5, width/height: 1024
- CLIPTextEncode (positive) — Your prompt, connected to the CLIP output
- CLIPTextEncode (negative) — Empty string for FLUX (it doesn't use negative prompts effectively)
- EmptyLatentImage — 1024×1024, batch_size: 1
- KSampler — steps: 28, cfg: 3.5, sampler: euler, scheduler: simple, seed: your choice
- VAEDecode — Connect latent output and VAE
- SaveImage — Connect decoded image, set filename prefix
Click "Queue Prompt" and wait for your image.
Performance Numbers
On the RTX 5070 Ti (16GB):
| Metric | Value |
|---|---|
| First generation (model load) | ~100 seconds |
| Subsequent generations | ~60-80 seconds |
| Peak VRAM usage | ~13.8GB |
| Image resolution | 1024×1024 |
| Steps | 28 |
| Quality compared to DALL-E 3 | Equal or better for tech/cyberpunk subjects |
Subsequent generations are faster because the model stays loaded in VRAM. If you've been using Ollama for text generation, you'll need to unload it first — FLUX needs the full VRAM.
# Free VRAM from Ollama before generating
curl -s http://localhost:11434/api/generate -d '{"model":"your-model","keep_alive":0}'
FLUX.1-dev vs DALL-E: When Local Wins
When DALL-E makes sense:
- One-off images — if you need one image and never again, the convenience wins
- Photorealistic people — DALL-E still has an edge on human faces
- Phone/casual use — no setup required
When FLUX.1-dev on ComfyUI wins:
- Volume — unlimited generations, zero marginal cost
- Tech/abstract subjects — circuit boards, server rooms, code visualisations
- Reproducibility — save a workflow, get the exact same result next month
- Control — seeds, samplers, LoRAs, ControlNet, inpainting
- Privacy — images never leave your machine
- Batch production — blog headers, product shots, social media assets
- Iteration — tweak one parameter, re-queue, compare instantly

For a blog producing multiple images per post, the local workflow is where the setup really starts paying for itself.
The Blog-Specific Workflow
Here's how I use it for blog production:
- Write the post in the Local AI Journal
- Design prompts based on the content — "circuit board close-up" for hardware posts, "server room panorama" for infrastructure, "abstract neural network" for AI topics
- Generate 4-6 variants with different seeds
- Pick the best and optimise for web (512×512 blog size, 95% JPEG quality)
- Save to
public/images/and reference in frontmatter
The prompt pattern that works for blog headers:
[subject description], cinematic lighting, ultra detailed quality,
professional technology aesthetic, no text no words no letters
That last part — "no text no words no letters" — is essential. FLUX is decent at text rendering, but you don't want random words appearing in your header images. Better to tell it explicitly to skip text.
What's Next
Now that ComfyUI is set up, the door is open to:
- LoRA fine-tuning on my brand colours and style
- ControlNet for guided compositions
- Inpainting for editing specific regions
- Image-to-image for variation on existing images
- Upscaling with Real-ESRGAN for print-quality outputs
- Automated batch generation via the API — queue prompts from scripts
The best part? All of this runs locally, costs nothing per image, and keeps every generation private. Your hardware, your rules.
Found this useful? 👉 Follow @Raf_VRS on X for more AI Guides updates 👉 Support the work: ko-fi.com/rafvrs
Setup tested on: RTX 5070 Ti 16GB, Ubuntu 24.04, Python 3.12, PyTorch 2.11+cu130, ComfyUI 0.19.3