AI Guides

DALL-E Broke My Budget: How I Set Up Free Unlimited FLUX Image Generation on Local Hardware

When ChatGPT image generation ate through all my credits, I built a free unlimited pipeline with ComfyUI and FLUX.1-dev running on a consumer GPU. Here's the full setup walkthrough.

2026-04-26 · 7 min read

The Problem: AI Image Credits Disappear Fast

Here's a fun story. I was generating images for my blog — headers, feature graphics, the works. DALL-E via ChatGPT is convenient, no question. Click a button, type a prompt, get a picture. Easy.

Then I checked my usage. Zero percent remaining. All those "just one more" generations at £0.032 each added up fast. When you're producing content regularly, those per-image costs become a real line item.

The worst part? You don't get a warning. You just hit a wall and suddenly you're either waiting for the monthly reset or paying more. Neither option works when you're on a publishing schedule.

The Solution: Free, Unlimited, Local

I already had FLUX.1-dev downloaded locally — it's been sitting in my HuggingFace cache, the full ~32GB diffusers model, eating disk space but working perfectly via Python scripts. The quality is genuinely better than what I was getting from DALL-E for tech-related images. But running it from a Python script means editing code every time you want to tweak something.

What I needed was a proper interface — something with a visual workflow where I could adjust prompts, change seeds, swap samplers, and iterate quickly without touching code.

Enter ComfyUI.

What is ComfyUI?

ComfyUI is a node-based interface for Stable Diffusion and FLUX models. Think of it like a visual programming environment for image generation. You connect nodes — one loads the model, one encodes your text prompt, one runs the diffusion steps, one decodes the result — and they form a pipeline you can see, tweak, and save as a reusable workflow.

Key advantages:

Free and open source — no credits, no subscriptions, no usage limits
Runs entirely locally — your images never leave your machine
Full control over every parameter — seeds, samplers, schedulers, LoRA stacking, ControlNet, inpainting
Reproducible — save a workflow, share it, run it again months later with the same result
Extensible — massive community of custom nodes for every technique

The Hardware

This runs on my RTX 5070 Ti (16GB VRAM). FLUX.1-dev is a big model:

~22GB transformer (merged from 3 sharded safetensors files)
~9GB T5-XXL text encoder
~1GB CLIP-L text encoder
~300MB VAE

Yes, that's more than 16GB of model files. But ComfyUI loads components sequentially and uses smart memory management. The actual inference works within 16GB VRAM because:

Text encoders are loaded, used, then can be offloaded
The transformer runs in bfloat16 (halves memory vs float32)
ComfyUI's sequential offloading keeps peak VRAM manageable

In practice, I see about 13.8GB VRAM usage during generation, leaving headroom on the 16GB card.

Step-by-Step Setup

1. Clone ComfyUI

cd ~
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

2. Create a Virtual Environment

uv venv --python 3.12 .venv
source .venv/bin/activate
uv pip install torch --index-url https://download.pytorch.org/whl/cu121
uv pip install -r requirements.txt

PyTorch with CUDA support is essential — the cu121 index gives you CUDA 12.1 compatibility. Adjust if your driver needs a different CUDA version.

3. Prepare the FLUX.1-dev Model Files

This is the tricky part. FLUX.1-dev from HuggingFace comes in sharded safetensors format — 3 files for the transformer, 2 for the T5 text encoder. ComfyUI expects single files.

If you already have the model in your HuggingFace cache (from using diffusers), you need to merge the shards. Here's a Python script that does the job:

from safetensors import safe_open
from safetensors.torch import save_file
from pathlib import Path

def merge_shards(shard_dir: Path, output_path: Path):
    """Merge multi-part safetensors into a single file."""
    shards = sorted(shard_dir.glob("*-of-*.safetensors"))
    all_tensors = {}
    metadata = None

    for i, shard_path in enumerate(shards):
        print(f"  Reading shard {i+1}/{len(shards)}: {shard_path.name}")
        with safe_open(str(shard_path), framework="pt", device="cpu") as f:
            if metadata is None:
                metadata = f.metadata()
            for key in f.keys():
                all_tensors[key] = f.get_tensor(key)

    output_path.parent.mkdir(parents=True, exist_ok=True)
    save_file(all_tensors, str(output_path), metadata=metadata)
    size_gb = output_path.stat().st_size / (1024**3)
    print(f"Saved {output_path} ({size_gb:.1f} GB)")

Run this for the transformer shards and T5 text encoder shards, then place the merged files in ComfyUI's model directories:

Component	Source	Destination
Transformer	`transformer/` shards → merge	`models/diffusion_models/flux1-dev.safetensors`
T5-XXL	`text_encoder_2/` shards → merge	`models/text_encoders/t5xxl_fp16.safetensors`
CLIP-L	`text_encoder/model.safetensors` → symlink	`models/clip/clip_l.safetensors`
VAE	`vae/diffusion_pytorch_model.safetensors` → symlink	`models/vae/ae.safetensors`

Important: If you already have FLUX.1-dev in your HuggingFace cache (e.g. from using it with diffusers or the HuggingFace API), you already have these files. The shard merging step is the only "extra" work — and it's a one-time operation.

4. Install ComfyUI Manager (Recommended)

cd ~/ComfyUI/custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
cd ~/ComfyUI && source .venv/bin/activate
uv pip install -r custom_nodes/ComfyUI-Manager/requirements.txt

The Manager gives you a browser UI for installing custom nodes, downloading models, and managing workflows. It's not required but makes life much easier.

5. Start ComfyUI

cd ~/ComfyUI
source .venv/bin/activate
PYTORCH_ALLOC_CONF=expandable_segments:True python main.py --listen 0.0.0.0 --port 8188

The PYTORCH_ALLOC_CONF=expandable_segments:True environment variable is critical for large models — it reduces VRAM fragmentation and prevents spurious out-of-memory errors.

6. The FLUX.1-dev Workflow

In the ComfyUI browser interface (http://localhost:8188), build this workflow:

UNETLoader — flux1-dev.safetensors, weight_dtype: default
DualCLIPLoader — clip_name1: clip_l.safetensors, clip_name2: t5xxl_fp16.safetensors, type: flux
VAELoader — ae.safetensors
ModelSamplingFlux — Connect the UNET model output, set max_shift: 1.15, base_shift: 0.5, width/height: 1024
CLIPTextEncode (positive) — Your prompt, connected to the CLIP output
CLIPTextEncode (negative) — Empty string for FLUX (it doesn't use negative prompts effectively)
EmptyLatentImage — 1024×1024, batch_size: 1
KSampler — steps: 28, cfg: 3.5, sampler: euler, scheduler: simple, seed: your choice
VAEDecode — Connect latent output and VAE
SaveImage — Connect decoded image, set filename prefix

Click "Queue Prompt" and wait for your image.

Performance Numbers

On the RTX 5070 Ti (16GB):

Metric	Value
First generation (model load)	~100 seconds
Subsequent generations	~60-80 seconds
Peak VRAM usage	~13.8GB
Image resolution	1024×1024
Steps	28
Quality compared to DALL-E 3	Equal or better for tech/cyberpunk subjects

Subsequent generations are faster because the model stays loaded in VRAM. If you've been using Ollama for text generation, you'll need to unload it first — FLUX needs the full VRAM.

# Free VRAM from Ollama before generating
curl -s http://localhost:11434/api/generate -d '{"model":"your-model","keep_alive":0}'

FLUX.1-dev vs DALL-E: When Local Wins

When DALL-E makes sense:

One-off images — if you need one image and never again, the convenience wins
Photorealistic people — DALL-E still has an edge on human faces
Phone/casual use — no setup required

When FLUX.1-dev on ComfyUI wins:

Volume — unlimited generations, zero marginal cost
Tech/abstract subjects — circuit boards, server rooms, code visualisations
Reproducibility — save a workflow, get the exact same result next month
Control — seeds, samplers, LoRAs, ControlNet, inpainting
Privacy — images never leave your machine
Batch production — blog headers, product shots, social media assets
Iteration — tweak one parameter, re-queue, compare instantly

Server room test — FLUX.1-dev generates tech imagery with impressive detail

For a blog producing multiple images per post, the local workflow is where the setup really starts paying for itself.

The Blog-Specific Workflow

Here's how I use it for blog production:

Write the post in the Local AI Journal
Design prompts based on the content — "circuit board close-up" for hardware posts, "server room panorama" for infrastructure, "abstract neural network" for AI topics
Generate 4-6 variants with different seeds
Pick the best and optimise for web (512×512 blog size, 95% JPEG quality)
Save to public/images/ and reference in frontmatter

The prompt pattern that works for blog headers:

[subject description], cinematic lighting, ultra detailed quality, 
professional technology aesthetic, no text no words no letters

That last part — "no text no words no letters" — is essential. FLUX is decent at text rendering, but you don't want random words appearing in your header images. Better to tell it explicitly to skip text.

What's Next

Now that ComfyUI is set up, the door is open to:

LoRA fine-tuning on my brand colours and style
ControlNet for guided compositions
Inpainting for editing specific regions
Image-to-image for variation on existing images
Upscaling with Real-ESRGAN for print-quality outputs
Automated batch generation via the API — queue prompts from scripts

The best part? All of this runs locally, costs nothing per image, and keeps every generation private. Your hardware, your rules.

Found this useful? 👉 Follow @Raf_VRS on X for more AI Guides updates 👉 Support the work: ko-fi.com/rafvrs

Setup tested on: RTX 5070 Ti 16GB, Ubuntu 24.04, Python 3.12, PyTorch 2.11+cu130, ComfyUI 0.19.3