Build Journal

Weekly Usage Report — Week 4 (Apr 27–May 3): 495 Million Accounted Tokens for £9.24

Week 4: 374.2M visible tokens plus 120.6M cached tokens, for 494.8M total accounted Hermes tokens across 2,461 sessions. Opus-equivalent API cost: about £6,213.

2026-05-05 · 8 min read

Weekly AI Usage Report — Week 4: When Usage Patterns Shift

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

Reporting period: Monday 27 April – Sunday 3 May 2026 Previous week (Week 3): 449.3M tokens, 2,288 sessions, £9.24/week

The headline numbers

Week 4 was never going to match Week 3's raw volume. That was the week I was rebuilding everything from scratch — cron jobs, toolchains, the whole Hermes stack. Week 4 was different. It was the week the usage pattern changed.

On the surface, fewer tokens might look like regression. It isn't. The I/O ratio halving tells the real story: more interactive work, more output-heavy sessions, more mixed-mode operation rather than just shoving massive context windows at the model and asking for one thing.

The week in one picture

This is the headline version of Week 4: 374.2M tokens, 2,461 sessions, £9.24 in subscription route cost — and the bottleneck shifting from invoices to daily usage limits.

Weekly Usage Report Week 4 — 495 million accounted tokens for £9.24 compared with per-token pricing

View full-size infographic

By model: the visible-route workhorses shift

The model mix tells you what kind of week it was. These route figures describe visible input/output tokens only; the 120.6M cached-context tokens are included in the weekly accounted total above, but are not allocated cleanly by model route here.

The Qwen 3.5 stat is worth a second look. 1,930 sessions at 15M visible tokens works out to under 8,000 visible tokens per session. That's the pattern of a utility model — fire and forget, low latency, no cost anxiety. Most people don't think about the small models, but they handle the bulk of daily interactions.

By source: the CLI takeover

The CLI number is the story here. 106 sessions accounting for over half the visible source-token volume. These are long, focused work sessions — development, debugging, architecture planning. When I'm in the terminal working on something, the token burn rate is completely different from quick Telegram queries or cron-driven automation.

Telegram's 21 sessions at 131.8M visible tokens reflects heavy mobile-accessible sessions — checking in, running reports, managing the system from outside the home network.

Cron sessions are the long tail: 2,334 automated jobs averaging about 15,700 visible tokens each. Monitoring, scheduled tasks, routine checks. The infrastructure layer you don't think about until it stops working.

Daily breakdown: a week of contrast

Cost comparison: the flat-rate flex

Same token volume through API billing would have looked like this:

These are estimates, not invoices, but the direction is unambiguous. At these volumes, flat-rate subscription pricing is transformative. Even compared to the cheapest API alternatives, the subscription cost is under a fifth of what you'd pay on consumption billing.

The practical effect: when the marginal cost of a query is zero, you stop thinking about whether a question is "worth" asking. You ask. And that changes how you work.

Week over week: what changed

Week 3 was raw scale — the build-out, the setup, the restoration of the Hermes environment from scratch. Week 4 was something else: operational maturity. More sessions, less total volume, but the work was more mixed and more deliberate. The CLI became the primary interface. Cron automation handled the background hum.

The drop in I/O ratio is the most interesting metric. 117:1 in Week 3 meant I was feeding the model enormous context windows — reading entire codebases, full configuration files, complete logs. 64:1 in Week 4 means the model was writing more, interacting more, generating more output. The conversations got more symmetrical.

The post-week development: Pro upgrade

Note: On Monday 4 May, after the reporting week ended, I upgraded ChatGPT to Pro at £89/month.

The reason was straightforward: I was hitting daily limits regularly. When you're running hundreds of sessions a day and doing app and game development on the side, the Plus-level limits become a bottleneck faster than you'd expect.

This is the operational insight that Week 4 crystallised: when the subscription covers the cost but the daily cap doesn't cover the volume, the constraint shifts from money to limits. Hitting a daily rate limit is more frustrating than a bill — it stops you mid-flow.

The Pro upgrade wasn't about wanting more features. It was about removing a throttle. When the ceiling becomes usage limits rather than invoices, you start managing AI like infrastructure rather than a utility.

The builder's takeaway

Week 4 taught me something I didn't expect to learn this early in the experiment.

Flat-rate AI subscriptions don't just change your cost structure. They change your behaviour. When every query costs the same as every other query, you optimise for throughput and quality, not for frugality. You run 2,461 sessions in a week not because you're trying to justify the subscription, but because the work demands it.

The effective rate of about £0.019 per million accounted tokens — under 2 pence per million tokens — means the subscription is already paid for by the first few serious sessions of the week. Everything after that is gravy.

If you're a builder running your own infrastructure and you're still on consumption-based AI billing, do the maths on your actual volume. The breakeven point on flat-rate subscriptions is lower than most people think. And the behavioural upside — not having to think twice about asking — is something spreadsheets don't capture.

Found this useful? 👉 Follow @Raf_VRS for more transparent AI insights that put you in control of your hardware. 👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency