Build Journal

Weekly Usage Report — Week 5 (May 4–10): 731 Million Accounted Tokens for £20.54

Week 5: 122.5M visible tokens plus 608.3M cached tokens, for 730.8M total accounted Hermes tokens across 651 sessions.

2026-05-11 · 5 min read

Weekly AI Usage Report — Week 5: The Usage Moved Windows

Reporting period: Monday 4 May – Sunday 10 May 2026
Previous week (Week 4): 494.8M total accounted tokens, 2,461 sessions, £9.24/week
Subscription context: ChatGPT Pro at £89/month.

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

This is the first weekly report where visible input/output tokens badly understate the real footprint. Hermes logged 122.5M visible tokens across 651 sessions, but cached context added another 608.3M tokens, taking the audited total to 730.8M accounted tokens.

The week in one picture

This is the headline version of Week 5: 122.5M visible tokens, 608.3M cached tokens, and 730.8M total accounted Hermes tokens. The local database is now split into visible and cached context instead of being reduced to one misleading headline number.

Weekly Usage Report Week 5 — visible and cached token accounting

View full-size infographic

Top visible model routes

The route percentages above describe the 122.5M visible input/output tokens only. The bigger Week 5 story is that cached context became the largest part of the full 730.8M accounted-token footprint.

Daily breakdown

What actually happened this week

Most of the time here went into guardrailing final designs and proofreading the blog. That is not glamorous work, but it is the difference between "the agent made something" and "this is safe enough to show people".

That meant checking final layouts, catching inconsistent copy, tightening public-facing posts, and making sure the blog did not look like it had been assembled by seven over-caffeinated agents in a trench coat.

The Week 5 story is clear: the local machine was doing more repeated-context work than the visible prompt/completion number suggested. The cached context was the hidden mass.

The price comparison

Using the audited 730.8M total accounted token workload, the per-token comparison looks like this:

These are still estimates, not invoices. But the direction is clear enough: even with the higher Pro subscription cost, flat-rate usage is still absurdly cheaper at this workload level.

The difference this week is that the subscription meter, not the local database, became the better signal for part of the work.

Week-over-week comparison

So the wrong headline is "usage collapsed".

The right headline is "usage moved".

Week 4 was mostly readable through visible tokens. Week 5 showed why the reporting model had to catch cached context, not just prompt/completion text.

The stack

No single dashboard sees all of this cleanly yet. That is fine, as long as the report says so plainly.

The bottom line

Week 5: 122.5M visible tokens, 608.3M cached tokens, 730.8M total accounted tokens, 651 Hermes sessions.

This is what happens when AI becomes part of the workshop rather than a single chat tab. Some work appears as fresh input/output. A lot of agent work reuses repeated context through cache. The report now shows both.

The operator lesson is simple: measure what you can, annotate what you cannot, and do not let a clean database tell a dirty lie.

Found this useful?
👉 Follow @Raf_VRS for more transparent AI insights that put you in control of your hardware.
👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency