Build Journal

Weekly Usage Report — Week 7 (May 18–24): Visible Tokens vs Cached Context

Week 7: 28.8M visible input/output tokens plus 343.7M cached tokens, for 372.5M total accounted Hermes tokens across 70 sessions.

2026-05-25 · 6 min read

Weekly AI Usage Report — Week 7: Visible Tokens vs Cached Context

Reporting period: Monday 18 May – Sunday 24 May 2026
Previous week (Week 6): 449.4M total accounted tokens, 133 sessions, £20.54/week Pro equivalent
Subscription context: ChatGPT Pro at £89/month.

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

Visible tokens (input + output): 28,767,464 (28.8M)
Cached tokens (cache-read/write): 343,734,144 (343.7M)
Total accounted tokens: 372,501,608 (372.5M)
Sessions: 70
Input tokens: 27,696,674
Output tokens: 1,070,790
ChatGPT Pro weekly cost equivalent: £20.54/week
Opus-equivalent API cost: approximately £4,521

This was the quietest weekly report since tracking began in April by visible input/output tokens, but the full model-traffic footprint was larger: Hermes logged 28.8M visible input/output tokens and 343.7M cached tokens, for 372.5M total accounted tokens across 70 sessions. Tuesday 19 May still recorded zero sessions, which is a first in the tracking history.

The week was not idle. It was deliberate.

The week in one picture

The headline is 372.5M total accounted tokens once cached context is included. No Tuesday sessions at all, a quiet Wednesday, and most of the work concentrated on Thursday and Saturday.

Weekly Usage Report Week 7 — 28.5 million Hermes-tracked tokens

View full-size infographic

Top visible model routes

GPT-5.5: 17.2M visible tokens, about 60.1% of visible route tokens. The main judgement and operator-support route, carrying most of the project planning, design collaboration, and editorial work.
Qwen 3.5 9B local: 10.2M visible tokens, about 35.7% of visible route tokens. Mostly background cron tasks — automated processing and routine checks that do not need a full-sized model.
Qwen 3 Coder 480B: 1.1M visible tokens, about 3.9% of visible route tokens. A handful of larger-context coding queries through Telegram.
Grok 4.3: 97K visible tokens, about 0.3% of visible route tokens. A single session about a short video concept for "One More Prompt".

The visible route distribution is simpler than recent weeks: GPT-5.5 and Qwen 3.5 local handled about 96% of the fresh input/output work between them. The cache-inclusive total is much larger because 343.7M cached-context tokens sit on top of those visible route figures.

Daily breakdown

Mon May 18: 17 sessions, 6,417,557 visible (6.4M) + 49,467,904 cached (49.5M) = 55,885,461 total accounted tokens (55.9M), 15.0% of the week; cache share 88.5%, visible share 11.5%. Work note: Infographic fixes, blog publishing orchestration, launch prep, and env-secret leak prevention. The busiest start to a week we have seen in a while — all practical ops work.
Tue May 19: 0 sessions, 0 visible + 0 cached = 0 total accounted tokens, 0.0% of the week. Work note: Most of the day was spent on the road for the final sign-off of leyp.co.uk, the Lancashire Early Years Partnership site, which is now live. First zero-session day since tracking began.
Wed May 20: 4 sessions, 1,572,810 visible (1.6M) + 45,691,904 cached (45.7M) = 47,264,714 total accounted tokens (47.3M), 12.7% of the week; cache share 96.7%, visible share 3.3%. Work note: Hard Interference pre-live checklist, off-canvas menu CSS fixes, link deduplication, and footer analytics consent work. Focused, surgical sessions.
Thu May 21: 22 sessions, 7,224,273 visible (7.2M) + 84,985,856 cached (85.0M) = 92,210,129 total accounted tokens (92.2M), 24.8% of the week; cache share 92.2%, visible share 7.8%. Work note: The biggest day. Workshop/source-split architecture, AI Guides feature planning across multiple iterations, PGX setup, network access debugging, and orchestration setup. This was the project engineering day.
Fri May 22: 5 sessions, 2,683,299 visible (2.7M) + 10,276,864 cached (10.3M) = 12,960,163 total accounted tokens (13.0M), 3.5% of the week; cache share 79.3%, visible share 20.7%. Work note: ThinkStation update recovery steps. Practical hardware ops.
Sat May 23: 17 sessions, 8,854,573 visible (8.9M) + 151,790,464 cached (151.8M) = 160,645,037 total accounted tokens (160.6M), 43.1% of the week; cache share 94.5%, visible share 5.5%. Work note: The heaviest day. Hermes dashboard/Kanban inspection, game design theme and 2D art pipeline testing, goal function misalignment review, and model identity/project context work. A proper Saturday build session.
Sun May 24: 5 sessions, 2,014,952 visible (2.0M) + 1,521,152 cached (1.5M) = 3,536,104 total accounted tokens (3.5M), 0.9% of the week; cache share 43.0%, visible share 57.0%. Work note: Cron-driven processing only. A genuine rest day.

What actually happened this week

The week broke into clear phases rather than one continuous thread.

Monday was about closing out the launch push: infographic fixes, publishing orchestration, env-secret hardening, and making sure the Hard Interference launch was not leaking credentials or shipping broken visuals.

Thursday was the project-engineering day. The workshop/source-split architecture was a structural decision about how Hermes projects should separate working copies from deploy artifacts. The AI Guides planning sessions ran across multiple iterations — feature planning, orchestration setup, and PGX network access. This was not one big context window, but a sequence of distinct sessions, each addressing a different part of the system.

Saturday was the build day. Game design took most of it — theme engineering, 2D asset pipeline testing, goal function review, and design feedback. The model identity inquiry and Hermes dashboard inspection were smaller pieces around the same theme: making sure the tools are understood before they are extended.

The rest of the week was lighter. Tuesday was the LEYP launch sign-off on the road. Wednesday was surgical CSS and checklist work. Friday was PGX recovery steps. Sunday was a rest day with only cron jobs running.

The price comparison

Using the 372.5M cache-inclusive token volume, the per-token comparison looks like this:

Claude Opus 4.6 API: approximately £388 — about 19x the ChatGPT Pro weekly equivalent
Gemini 2.5 Pro API: approximately £35 — about 1.7x
Claude Sonnet API: approximately £78 — about 3.8x
GPT-5.3 Codex API: approximately £376 — about 18x
DeepSeek Chat API: approximately £7 — about 0.3x
GPT-4o mini API: approximately £4 — about 0.2x

These are estimates, not invoices. At this volume, the flat subscription is not stretched particularly hard — but that is not the point of this week.

Week-over-week comparison

Visible tokens: 43.0M → 28.8M, down 33.2%
Total accounted tokens: 449.4M → 372.5M, down 17.1%
Hermes sessions: 133 → 70, down 47.4%
Constraint: Week 6 was travel + ops + malware cleanup. Week 7 was measured project work with a deliberate gap day.

From Week 1's partial 52M-token kickoff to Week 3's 449M peak and now Week 7's 28.5M floor, the tracking range has widened considerably. That is not a trend, it is a signal: the work changes shape week to week, and the weekly report is most useful when it reflects the kind of work, not just the volume.

The stack

ChatGPT Pro: £89/month, about £20.54/week.
Hermes on Linux: local orchestration, project architecture, AI Guides planning, game design, PGX ops, and editorial work.
Qwen 3.5 9B local: zero marginal cost utility worker — mostly cron automation this week.
Grok 4.3: one-off session for video concept work.
PGX: recovery steps and network access debugging this week — slower progress than hoped, but the work is structural rather than stalled.

No new hardware entered the stack this week. No new free-tier experiments. No big model swaps.

The bottom line

Week 7: 28.8M visible tokens, 343.7M cached tokens, 372.5M total accounted tokens, 70 Hermes sessions.

This was the quietest week by visible input/output tokens, but the cache-inclusive total shows a much larger model-traffic footprint. That distinction matters because most of the work was repeated context, not newly written text.

The work itself was not quiet. Project architecture decisions, AI Guides feature planning, game design, PGX debugging, LEYP launch sign-off, and editorial cleanup all happened. But they happened at a measured pace, on working days with breathing room, and with at least one day where the Hermes machine sat entirely unused.

That is not a failure. A workshop that runs every day at full blast is a workshop that never refactors. A Tuesday with zero sessions is not waste. It is the sign of an operator who decides what kind of week they need, rather than letting the token counter decide for them.

Found this useful?
👉 Follow Raf_VRS on X for more transparent AI insights that put you in control of your hardware.
👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency