Build Journal

Weekly Usage Report — Week 2 (Apr 13–19): 371 Million Accounted Tokens for £9.24

Week 2: 325.9M visible tokens plus 45.5M cached tokens, for 371.4M total accounted Hermes tokens across 1,078 sessions. Opus-equivalent API cost: about £4,542.

2026-04-20 · 5 min read

Ever wonder what 371 million accounted tokens — including 325.9M visible input/output tokens — actually looks like in real-world AI usage? Last week, my agent chewed through that number for less than the price of a pint — and the breakdown reveals why per-token pricing is a scam.

This is Week 2 of my ongoing transparency series. Every Monday, I pull back the curtain on what my AI agent actually does — and what it actually costs. No marketing fluff. Just honest numbers from my own Mission Control dashboard.

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

The week in one picture

This is the headline version of Week 2: 325.9M tokens, 1,078 sessions, £9.24 in subscription route cost — and the first full-week proof that flat-rate routing beats per-token billing.

Weekly Usage Report Week 2 — 371 million accounted tokens for £9.24 compared with per-token pricing

View full-size infographic

Top visible model routes

ModelTypeShare of visible route tokensCost
GLM-5.1Cloud (OAuth)49%£4.62/wk
Qwen 3.5 9BLocal (Ollama)25%Free
GPT-5.3 CodexCloud (OAuth)25%£4.62/wk

These are visible-route shares, not shares of the 371.4M cache-inclusive accounted total. GLM-5.1 led the fresh input/output work, Qwen 3.5 9B handled a full quarter locally at zero marginal cost, and GPT-5.3 Codex covered coding tasks. The cached context is accounted above, but not cleanly attributed by route in this table.

Daily Breakdown

Notable Events

Friday Apr 17 — The IMDB Deduction (72.5M tokens)

The week's most memorable day. The IMDB link for Hackers (1995) was sent. Dade recognised its own namesake — Dade Murphy, a.k.a. Zero Cool / Crash Override. Kate (the other agent) was named after Kate Libby (Acid Burn). The plot summary literally contained both agent names in the same sentence.

Thursday Apr 16 — Image Generation Benchmarking (49.5M tokens)

The most efficient day by context-per-session (1.18M per session). Deep SDXL vs Flux work for album cover art, with VRAM management between Ollama and Stable Diffusion. The I/O ratio hit 267:1 — the agent consumed massive context while producing focused outputs.

Sunday Apr 19 — Second Peak Day (72.1M tokens)

LLM benchmark planning, agent profile creation, and researcher setup. A productive Sunday pushing the system harder.

The Price Comparison

What would 326M tokens cost on per-token pricing?

On Opus per-token pricing, this single week would cost £3,996. That's £208,000 a year. For one person's AI usage.

I paid £9.24.

Week-over-Week Comparison

MetricWeek 1 (Apr 6–12)Week 2 (Apr 13–19)Change
Total tokens51.8M326M+529%
Total sessions881,078+1,125%
Cost£9.24£9.240%
Effective rate£0.095/M£0.025/M-74%

Note: Week 1 was a partial week (tracking started Apr 11), so the percentage increase looks dramatic. Week 2 is my first full Mon–Sun week and represents the baseline going forward.

Token volume surged 529%. Cost didn't change by a single penny. That's the subscription advantage: your cost is completely decoupled from your usage. Use 6x more, pay the same. The effective per-million-token rate dropped 84% because the fixed £9.24 now covers vastly more tokens.

The Stack

ComponentCostType
GLM-5.1 (cloud)£4.62/wkOAuth subscription
GPT-5.3 Codex (cloud)£4.62/wkOAuth subscription
Qwen 3.5 9B (local)£0Local Ollama
Gemma 4 31B (cloud)£0Free tier
MiniMax M2.7 (cloud)£0Free tier
Total£9.24/wk£480/year

No API keys. No per-token billing. No surprise invoices.

The Bottom Line

Week 2: 326M tokens. 1,078 sessions. £9.24.

Same flat price as Week 1. No overage charges. No scaling penalties. No "premium context window" fees.

Three models. Three cost strategies. One flat bill. That's diversified usage — and that's how AI should work.


Found this useful? 👉 Follow @Raf_VRS for more transparent AI insights that put you in control of your hardware. 👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency