ComfyUI Without the Fog: Build Your Own Image Workflow, or Let an Agent Bootstrap It
ComfyUI looks terrifying until you realise it is just a visible pipeline: models, prompts, samplers, latents, outputs. This guide gives you the local setup path, the cloud path, and the agent-assisted path for getting from zero to a working workflow without pretending the graph is magic.
The first time ComfyUI opens, it does not look like an image generator.
It looks like someone dropped a circuit board into a browser and asked you to make art with it.
Boxes. Wires. Model loaders. Samplers. Latents. Encoders. Nodes with names that sound like they escaped from a GPU driver changelog. It is very easy to bounce off the thing in the first five minutes and retreat back to the neat prompt box in ChatGPT, Midjourney, Sora, or whatever polished app is offering the least friction that week.
That is the trap.
The prompt box is convenient, but it hides the machine from you. ComfyUI exposes the machine. It shows you what is really happening: a model is loaded, text is encoded, noise is sampled, a latent is decoded, an image is saved. Once you can see the chain, you can alter the chain.
That is why ComfyUI matters.
For me, Raf_VRS, the lesson arrived through two different doors at once. I spent real OpenAI allocation generating blog images in ChatGPT. Fast, impressive, sometimes useful, sometimes expensive in the invisible way subscription usage is expensive. Then I started playing with ComfyUI locally, where the cost moved from account allocation to GPU time, disk space, workflow discipline, and the occasional error message from hell.
This guide is the bridge between those worlds.
It is for the person who wants to get started without being buried alive in custom-node folklore. It is also for the person who wants an agent to do the boring setup work, explain what it changed, and leave behind a workflow they can actually inspect.
What ComfyUI actually is
ComfyUI is a node-based interface for generative AI workflows.
That sentence sounds more complicated than the thing itself.
A normal image app gives you one box:
Write prompt. Press button. Receive image.
ComfyUI breaks that hidden process into visible pieces:
- a model loader picks the checkpoint or diffusion model
- a text encoder turns your prompt into conditioning
- a latent image node decides the canvas size and batch count
- a sampler turns noise into an image using the model and conditioning
- a VAE decoder converts the latent result into pixels
- an output node saves or previews the image
The graph is not decoration. It is the workflow.
That is the power. You can swap the model, alter the sampler, add ControlNet, use an image as input, plug in a LoRA, upscale the output, inpaint a section, generate video frames, or create repeatable batches with fixed seeds.
It is also the pain. Every extra degree of control is another place to miswire something.
So the goal is not to learn every node on day one. The goal is to get one boring workflow running, understand the shape of it, then improve it one controlled step at a time.
Choose your path first: local or cloud
Before installing anything, decide where ComfyUI should run.
There are two sane paths.
First: local ComfyUI.
This means the work runs on your own machine. It is free per image once set up, private by default, and brilliant if you have a capable GPU. The trade-off is that you own the setup. Drivers, models, Python environments, disk space, VRAM limits, broken custom nodes — congratulations, the dragon lives in your house now.
For local use, a practical baseline is:
- NVIDIA GPU with at least 6 GB VRAM for light workflows
- 8 GB VRAM or more for SDXL comfort
- 12 GB VRAM or more for Flux and heavier workflows
- plenty of disk space, because models are not shy
- patience when something fails the first time
Second: Comfy Cloud.
This runs workflows on Comfy's hosted infrastructure. It is simpler to start, avoids local GPU pain, and is useful if your machine is weak or you want reliable hosted execution. The trade-off is account setup, API keys, subscription limits, and less ownership of the runtime.
The rule is simple:
If you have a proper GPU and want control, go local.
If you do not have the hardware, or you just want to learn workflows before committing, use cloud.
Do not install local ComfyUI on a potato and then blame the potato for being a potato. That way lies forum archaeology.
The DIY local setup
The cleanest modern install path is the official comfy-cli.
You need Python first. On Linux, check:
python3 --version
Then install the CLI. Prefer pipx if you have it:
pipx install comfy-cli
If you use uv, you can run it without permanently installing:
uvx --from comfy-cli comfy --help
Disable the first-run analytics prompt non-interactively:
comfy --skip-prompt tracking disable
Then install ComfyUI for your hardware.
For NVIDIA:
comfy --skip-prompt install --nvidia
For AMD on Linux:
comfy --skip-prompt install --amd
For Apple Silicon:
comfy --skip-prompt install --m-series
For CPU only:
comfy --skip-prompt install --cpu
CPU works in the same way a bicycle technically works for moving a wardrobe. Possible, educational, not recommended.
Launch the server:
comfy launch --background
Check it is alive:
curl -s http://127.0.0.1:8188/system_stats
Then open:
http://127.0.0.1:8188
That gets you the room. It does not yet guarantee you have the right furniture.
The model problem
ComfyUI without models is just an attractive wiring diagram.
You need at least one usable model. Models usually live under the ComfyUI workspace in folders like:
ComfyUI/models/checkpoints/
ComfyUI/models/loras/
ComfyUI/models/vae/
ComfyUI/models/clip/
ComfyUI/models/diffusion_models/
ComfyUI/models/upscale_models/
For a simple starter path, use SDXL or SD 1.5.
SD 1.5 is lighter and easier on older cards. SDXL gives stronger general quality but wants more VRAM. Flux can produce excellent results, but it is heavier and has extra companion files to keep straight.
A model download through comfy-cli looks like this:
comfy model download \
--url "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors" \
--relative-path models/checkpoints
Then list what you have:
comfy model list
Do not skip this check. A huge percentage of ComfyUI pain is just a workflow asking for a model filename that does not exist on your machine.
Exact names matter. File extensions matter. Folder placement matters.
This is where the graph stops being art and starts being ops.
Your first workflow should be boring
Do not begin with the most elaborate workflow you found on Reddit.
Begin with text-to-image.
A basic workflow should have:
- checkpoint loader
- positive prompt text encoder
- negative prompt text encoder
- empty latent image
- sampler
- VAE decode
- save image
Set the image size to something reasonable, like 1024×1024 for SDXL or smaller if your VRAM is tight.
Use a fixed seed at first. Fixed seeds make debugging possible. Random seeds are fun once the pipeline is stable.
A starter prompt can be plain:
A clean product photograph of a compact Linux workstation on a dark desk, soft studio lighting, realistic, high detail
A starter negative prompt can be equally plain:
blurry, low quality, distorted text, watermark, extra limbs, artefacts
Generate one image.
If it works, save the workflow.
If it fails, do not change five things at once. Read the error. Most early failures are one of these:
- missing model
- missing custom node
- workflow saved in the wrong format
- GPU out of memory
- model filename mismatch
- server not actually running
That is not glamorous, but it is solvable.
API format versus editor format
This bit matters if an agent is going to run workflows for you.
ComfyUI has two workflow formats.
The editor format is what the visual UI uses. It has nodes and links arranged for the canvas.
The API format is what the execution endpoint expects. Each node is represented by an ID, a class_type, and inputs.
If you want scripts or agents to submit workflows automatically, you need API format.
In the web UI, use:
Workflow → Export (API)
or, in older versions:
Save (API Format)
A good agent should check this before running anything. If the file has top-level nodes and links, it is probably editor format. If each node has a class_type, it is probably API format.
This sounds boring until it saves you an hour of asking why a perfectly visible graph refuses to execute.
Running a workflow by API
Locally, ComfyUI exposes a REST API.
Submit a workflow with:
curl -X POST "http://127.0.0.1:8188/prompt" \
-H "Content-Type: application/json" \
-d '{"prompt": YOUR_WORKFLOW_JSON, "client_id": "YOUR-CLIENT-ID"}'
Check history:
curl -s "http://127.0.0.1:8188/history"
Download an output:
curl -s "http://127.0.0.1:8188/view?filename=ComfyUI_00001_.png&subfolder=&type=output" \
-o output.png
For cloud, the paths move under /api, and you add an API key header:
curl -X POST "https://cloud.comfy.org/api/prompt" \
-H "X-API-Key: $COMFY_CLOUD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"prompt": YOUR_WORKFLOW_JSON}'
That is the important difference:
Local is usually unauthenticated on your machine.
Cloud needs X-API-Key.
Never paste the key into a chat. Put it in your local environment or secret manager. If a key appears in logs or chat, treat it as compromised and rotate it. I have learned that lesson the annoying way so you do not have to.
The agent-assisted path
Now for the fun part.
You do not have to personally click through every install, model check, dependency check, and smoke test. An agent can do a lot of the dull work.
The safe version is not “agent, install random internet workflows and run everything”. That is how you accidentally turn a creative tool into a Python execution roulette wheel.
The safe version is:
- The agent checks your hardware.
- The agent recommends local or cloud.
- The agent installs ComfyUI only after you approve the path.
- The agent verifies the server with
/system_stats. - The agent lists installed models.
- The agent checks workflow dependencies before running anything.
- The agent runs a tiny smoke test.
- The agent saves the workflow, output path, model names, seed, and prompt.
- The agent explains what changed.
That last step matters. An agent that leaves you with a magic folder and no explanation has not helped you. It has merely moved the fog.
A prompt to give your agent
If you want an agent to bootstrap ComfyUI for you, use something like this:
Set up ComfyUI safely for image generation.
First, check my hardware and tell me whether local ComfyUI or Comfy Cloud is the better path. Do not install anything until you have explained the recommendation.
If local is suitable, use the official comfy-cli path. Install ComfyUI for my GPU type, disable analytics prompts non-interactively, launch it on 127.0.0.1:8188, and verify /system_stats.
Then check whether I have at least one usable starter model. If not, recommend a lightweight starter model and ask before downloading large files.
Create or locate a simple text-to-image workflow. Verify it is API format. Run one smoke test with a small number of steps. Save the output path, prompt, seed, model filename, and workflow file path.
Do not paste secrets. Do not run unknown custom nodes from untrusted workflows. Do not silently substitute placeholder images if generation fails. Report the blocker clearly.
That prompt is not fancy. It is operational.
It tells the agent to check before installing. It separates recommendation from execution. It forces verification. It tells the agent not to fake success.
That is how you use agents around tools that can run arbitrary Python.
A workflow prompt to get started
Once ComfyUI is running, give the agent a second prompt:
Create a beginner ComfyUI workflow for text-to-image.
Use an installed model that actually exists on this machine. Keep the workflow simple: model loader, positive and negative prompts, latent image, sampler, VAE decode, and save image.
Use 1024×1024 if VRAM allows, otherwise choose a safer size. Use a fixed seed and record it. Use 20 steps for the first test.
Positive prompt:
A clean editorial image of a compact AI workstation on a dark desk, subtle purple and blue lighting, realistic, sharp focus, no text.
Negative prompt:
blurry, low quality, watermark, distorted text, extra objects, artefacts.
Run one smoke test. If it fails, diagnose the first real error instead of changing multiple settings at once. Save both the API workflow and, if possible, an editable workflow for the ComfyUI web UI.
Notice the line about saving both versions.
API workflows are great for automation. Editable workflows are better for humans. If you only save the API version, the web UI can sometimes load it awkwardly or complain about missing output nodes. Keep both when you can.
What I learned locally
My local setup taught me a few useful, slightly sharp lessons.
First, a server can respond while generation is still broken. /system_stats proving ComfyUI is alive does not prove the sampler can finish a job.
Second, Flux workflows can look complete and still collapse into useless abstract gradients if the conditioning is wired badly. For Flux, guidance matters. In my case, missing FluxGuidance was enough to turn concrete prompts into mush. The fix was not a better prompt. The fix was better wiring.
Third, companion files matter. A Flux model is not always just one giant file. You may also need CLIP, T5, and VAE files in the right folders.
Fourth, stale background processes can lie to you. I saw a stale stdout pipe trigger BrokenPipeError during sampling while the server still looked alive. Restarting ComfyUI cleanly and smoke-testing before a full batch was the right move.
That is why the setup checklist matters. Not because checklists are exciting. Because they stop you from hallucinating progress.
When to use ChatGPT images versus ComfyUI
Use ChatGPT image generation when:
- you need speed
- you are ideating
- you want a strong first visual without building a workflow
- exact repeatability does not matter
- you are happy spending account allocation
Use ComfyUI when:
- repeatability matters
- you want control over models and seeds
- you need batches or variations
- you want local privacy
- you want to tune a workflow over time
- you want the agent to build a repeatable image pipeline for the blog
For blog work, the best answer is probably both.
Use ChatGPT to explore visual direction quickly. Use ComfyUI to turn repeated image needs into a controlled local pipeline. Hero cards, benchmark images, recurring guide visuals, consistent brand treatments — those are workflows, not one-off miracles.
That is where ComfyUI earns its place.
The minimum viable ComfyUI habit
If you remember nothing else, remember this workflow:
- Start small.
- Verify the server.
- Verify the model exists.
- Use a boring workflow first.
- Fix one error at a time.
- Save the seed.
- Save the workflow.
- Save the output path.
- Only then get clever.
The graph is not there to intimidate you. It is there to make the machine visible.
And once the machine is visible, you can make it yours.
That is the whole point.
For Raf_VRS and anyone else building in public, ComfyUI is not just another image toy. It is a way to stop renting every visual experiment from a black box. It lets the workstation become part of the creative stack: messy, inspectable, fixable, and mine.
Your hardware. Your rules.
Found this useful? Follow @Raf_VRS for more local AI build notes, or support the work at ko-fi.com/rafvrs.
#ComfyUI #LocalAI #ImageGeneration #AIWorkflow #OpenSourceAI #HardInterference