Hardware Guides

The True Cost of Running AI Locally — £0.08/M Tokens vs £24/M Tokens

I calculated the real cost of local+OAuth AI inference including hardware amortisation and electricity. The result: 310x cheaper than Claude Opus, 62x cheaper than Sonnet, and even 3x cheaper than GPT-4o mini.

2026-04-17 · 5 min read

215 Million Tokens for £16.64/Week

Everyone talks about the cost of AI APIs. Nobody talks about the cost of running it yourself — including the hardware you already bought.

I tracked every token for a full week across my stack: GLM-5.1 and GPT-5.3 Codex on flat-rate OAuth subscriptions (£4.62/wk each), Qwen3.5 9B running locally on a consumer RTX 5070 Ti, plus free-tier cloud models for light tasks.

Here’s the headline:

Metric	Value
Total tokens processed	215,087,866
Subscription cost	£9.24/week
Hardware amortisation (4yr)	£6.25/week
Electricity	£1.15/week
Total true cost	£16.64/week
Effective rate	£0.08/M tokens

That’s right. Including everything — the GPU, the RAM, the electricity, the subscriptions — I am processing 215 million tokens per week at eight pence per million tokens.

The Stack, Itemised

Component	Weekly Cost	Type
GLM-5.1 (OAuth)	£4.62	Subscription
GPT-5.3 Codex (OAuth)	£4.62	Subscription
RTX 5070 Ti + 64GB DDR5 + Core Ultra 7 (4yr amortisation)	£6.25	Hardware
Electricity (~350W, 6.5 GPU-hours)	£1.15	Running cost
Qwen3.5:9b (local)	£0.00*	Free
Gemma4:31b-cloud	£0.00	Free tier
Minimax-M2.7:cloud	£0.00	Free tier

*Local inference electricity is included in the £1.15 figure. The model itself is free.

How I Got Hardware Amortisation

I am not going to pretend hardware is free — that’s the trick most "local AI is cheaper" articles pull. Here’s my maths:

AI-relevant hardware: RTX 5070 Ti (about £750), 64GB DDR5 (about £200), Core Ultra 7 265KF (about £350) = about £1,300
Useful life: 4 years (208 weeks)
Weekly amortisation: £6.25/week

Yes, you could argue the PC gets used for other things too. And you’d be right — if your GPU is also your gaming rig, cut that in half. But even at full price, it’s a rounding error compared to per-token API costs at the workload I’m running.

The Comparison That Matters

Here’s where it gets fun. I took my actual token usage and calculated what it would cost at published per-token pricing:

Provider	Rate	What my week would cost	vs my cost
Claude Opus 4.6	£24.00/M	£5,167	310x
Gemini 2.5 Pro	£6.10/M	£1,313	79x
Claude Sonnet 4	£4.80/M	£1,032	62x
GPT-5.3 Codex (per-token)	£2.63/M	£567	34x
DeepSeek Chat	£0.47/M	£101	6x
GPT-4o mini	£0.26/M	£56	3x
My stack	£0.08/M	£16.64	1x

Three hundred and ten times cheaper than Opus. Sixty-two times cheaper than Sonnet. Even against the cheapest per-token API — GPT-4o mini at £0.26/M — I am still 3x better off.

And before you say "but GPT-4o mini isn’t as good" — you’re right, it isn’t. GLM-5.1 and GPT-5.3 Codex are genuinely powerful models. This isn’t a toy comparison.

The Input/Output Imbalance

My token mix is heavily input-biased — 99.4% input, 0.6% output. This is typical for agent workloads: tool results, web pages, and file contents dominate the context window, while the model’s responses are relatively terse.

Model	Input Tokens	Output Tokens	I/O Ratio
GLM-5.1	123,870,661	561,707	220:1
GPT-5.3 Codex	89,702,354	676,882	133:1
Qwen3.5:9b (local)	319,488	24,262	13:1

This is important because API pricing heavily penalises output tokens. Claude Opus charges £6/M input but £30/M output. When your workload is 99% input, the "cheap input" headline rate is misleading — you’re still paying through the nose because the per-token model treats your 200M input tokens as a revenue opportunity.

Flat-rate subscriptions flip this: whether your ratio is 1:1 or 220:1, the price stays at £4.62/week.

Why "Free Local" Isn’t Actually Free

Qwen3.5:9b runs locally and costs £0 in API fees. But I included its electricity in my calculation because honesty matters. At ~350W system draw and roughly 6.5 GPU-hours over the week, local inference adds about £1.15/week to the electricity bill.

That’s trivial — but it’s not zero. And if you’re running agents 24/7, that number climbs fast. A constantly-occupied GPU at 350W over a full week is 58.8 kWh, which at UK rates is about £17/week — more than the subscriptions.

The lesson: local inference is "free" until you saturate the GPU. Then electricity becomes your new per-token cost.

What You Actually Need

What	My Pick	Why
Primary model (cloud)	GLM-5.1	Flat-rate OAuth, strong reasoning
Coding model (cloud)	GPT-5.3 Codex	Flat-rate OAuth, top-tier code gen
Cheap/quick tasks (local)	Qwen3.5:9b	Free, fast, good enough for simple routing
GPU	RTX 5070 Ti 16GB	Runs 9B quantised comfortably, handles FLUX image gen
RAM	64GB DDR5	Fits full context windows locally
Total weekly cost	£16.64	Including hardware amortisation

The Fine Print

My effective rate fluctuates. Light weeks = higher per-token cost. Heavy weeks = lower. At 215M tokens/week, £0.08/M is my current rate. It’ll settle further as I run more agents concurrently.
Flat-rate plans have rate limits. You’re not getting unlimited throughput — you’re getting predictable cost. If you need 10 concurrent sessions hammering Opus, OAuth won’t save you.
Hardware costs are front-loaded. You pay £1,300 on day one. The amortisation is comforting on paper, but you still paid it already.
Local models have quality ceilings. Qwen3.5:9b handles routing and simple tasks well. It doesn’t replace GLM-5.1 for complex reasoning. That’s why I have both.

The Bottom Line

The "true cost of running locally" isn’t just the electricity. It’s subscriptions + hardware + electricity. But even accounting for all of it, the maths is brutal for per-token APIs.

£16.64/week for 215M tokens. That’s £0.08 per million tokens including everything.

The same volume on Claude Opus would cost £5,167/week. That’s not a typo. That’s three hundred and ten times more expensive.

The Cost Comparison — At a Glance

True Cost of Running Locally — cost efficiency comparison

View full-size infographic

Run locally. Run smart. Run the numbers.

Found this useful? 👉 Follow @Raf_VRS for more Hard Interference field notes. 👉 Support the work: ko-fi.com/rafvrs