Build Journal

Weekly Usage Report — Week 6 (May 11–17): Visible Tokens vs Cached Context

Week 6: 43.0M visible tokens plus 406.4M cached tokens, for 449.4M total accounted Hermes tokens across 133 sessions.

2026-05-18 · 9 min read

Weekly AI Usage Report — Week 6: The Week the Tokens Stayed in the Tank

Reporting period: Monday 11 May – Sunday 17 May 2026
Previous week (Week 5): 730.8M total accounted tokens, 651 sessions, £20.54/week Pro equivalent
Subscription context: ChatGPT Pro at £89/month.

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

Visible tokens (input + output): 43,042,799 (43.0M)
Cached tokens (cache-read/write): 406,373,280 (406.4M)
Total accounted tokens: 449,416,079 (449.4M)
Sessions: 133
Input tokens: 41,332,055
Output tokens: 1,710,744
ChatGPT Pro weekly cost equivalent: £20.54/week
Opus-equivalent API cost: approximately £5,475

This is the report where the quiet number is the honest number.

Week 6 looked quiet by visible input/output tokens. The cache-inclusive total shows the real footprint was larger: repeated context made up most of the model traffic.

I was on the road for most of it. When there was time, the priority was the Hard Interference blog launch, project planning, Lenovo PGX setup, and the future use of that box as a proper portable demo machine. The PGX was ordered on 11 May and arrived on 12 May, which turned the hardware plan from “research track” into “right, this thing is actually in the room now”.

There was also an Alex Finn video in the mix, and it did not create the idea so much as confirm the direction I was already moving in: a DGX Spark / PGX-style box is exactly the class of hardware Hermes needs if this is going to become a serious local agent workshop rather than a clever desktop experiment.

Then the Windows laptop’s AV caught a trojan warning, which meant the sensible work was not “build another feature”. It was PowerShell, cleanup, verification, and making sure there was not a single trace left behind.

Then I tightened the Ubuntu machine as well, because one warning shot is enough. Fun little hobby, modern computing. Very relaxing.

The PGX work also slowed for a boring but important reason: Lenovo’s reset, recovery, and encryption guidance for this setup is not up to date yet. That matters because this PGX is not just a desk ornament. It is meant to travel as a demo box. A portable AI appliance that goes in and out of meetings needs disk encryption before it gets treated as real kit. The guide will cover exactly that gap: the practical reset/encryption path, with observed steps separated from vendor assumptions.

So no, Week 6 was not token-heavy. It was operations-heavy.

The week in one picture

The headline version: 43.0M visible tokens, 406.4M cached tokens, and 449.4M total accounted Hermes tokens. The work was operational, but the repeated context footprint was still substantial.

Weekly Usage Report Week 6 — visible and cached token accounting

View full-size infographic

Top visible model routes

GPT-5.5 inside Hermes: 26.0M visible tokens, about 60.4% of visible route tokens. This was the main judgement and operator-support route.
Qwen 3.5 9B local: 12.5M visible tokens, about 29.0% of visible route tokens. Still the low-cost utility worker.
Qwen 3 Coder 480B: 3.3M visible tokens, about 7.7% of visible route tokens.
GLM-5.1 cloud: 1.2M visible tokens, about 2.8% of visible route tokens.

These route percentages describe visible input/output tokens only. Week 6 still had 406.4M cached-context tokens on top, which is why the full accounted total is much larger than the route list alone.

Daily breakdown

Mon May 11: 30 sessions, 8,586,640 visible (8.6M) + 93,214,112 cached (93.2M) = 101,800,752 total accounted tokens (101.8M), 22.7% of the week; cache share 91.6%, visible share 8.4%. Work note: Week kickoff, VRS/Hard Interference launch work, project planning, and the PGX order moving from idea to reality.
Tue May 12: 19 sessions, 6,650,231 visible (6.7M) + 57,304,576 cached (57.3M) = 63,954,807 total accounted tokens (64.0M), 14.2% of the week; cache share 89.6%, visible share 10.4%. Work note: Project context, planning, and the PGX arrival becoming part of the actual operating plan.
Wed May 13: 25 sessions, 9,154,653 visible (9.2M) + 74,030,592 cached (74.0M) = 83,185,245 total accounted tokens (83.2M), 18.5% of the week; cache share 89.0%, visible share 11.0%. Work note: The busiest Hermes-visible day of the week, including Kate training/testing on a small upcoming app and boxed-builder workflow checks.
Thu May 14: 14 sessions, 5,007,949 visible (5.0M) + 31,922,176 cached (31.9M) = 36,930,125 total accounted tokens (36.9M), 8.2% of the week; cache share 86.4%, visible share 13.6%. Work note: Follow-through on Kate testing, response-coach style app work, and general project operations.
Fri May 15: 23 sessions, 5,643,631 visible (5.6M) + 54,079,488 cached (54.1M) = 59,723,119 total accounted tokens (59.7M), 13.3% of the week; cache share 90.6%, visible share 9.4%. Work note: Blog, memory-system, Android/tooling, and machine-work follow-through.
Sat May 16: 10 sessions, 3,584,561 visible (3.6M) + 38,995,456 cached (39.0M) = 42,580,017 total accounted tokens (42.6M), 9.5% of the week; cache share 91.6%, visible share 8.4%. Work note: PGX first-boot, baseline capture, access planning, and reset/encryption guide work.
Sun May 17: 12 sessions, 4,415,134 visible (4.4M) + 56,826,880 cached (56.8M) = 61,242,014 total accounted tokens (61.2M), 13.6% of the week; cache share 92.8%, visible share 7.2%. Work note: Windows AV incident triage, browser service-worker cleanup, Ubuntu/Aurora tightening, PGX/DGX OS investigation, and blog launch recovery work.

What actually happened this week

The main workload was not “write code until the fans scream”. It was keeping the whole operation moving while the environment changed around it.

The blog launch stayed the priority. That meant planning, review, tightening, localdemo work, and making sure the public-facing side of Hard Interference was not just technically correct, but credible. A launch week can burn a lot of judgement without burning many tokens.

The PGX became real during the week: ordered on Monday, arrived on Tuesday, then immediately folded into the bigger Hermes plan. The Alex Finn video review helped sharpen the point. The box is not interesting because it is shiny hardware. It is interesting because it fits the direction Hermes is already moving in: local agents, local context, local orchestration, and enough dedicated compute to stop treating serious agent work as a side quest on the main desktop.

Some of the week also went into training and testing Kate on a small upcoming app. I am not naming it here yet. The useful part is the operating pattern: Kate can be tested in a boxed workspace, Dade can verify what actually changed, and anything that smells like a hallucinated “done” claim gets caught before it touches the real app. That is not glamorous, but it is how agents become tools instead of chaos goblins with commit access.

The security side was more direct. The Windows AV caught a trojan warning. I treated that as an incident, not a shrug, with Dade walking me through the PowerShell checks. Startup and scheduled-task review, active-process checks, browser service-worker review, full browser cleanup, reboot, and McAfee rescan all came first. The final scan was clean. Then I tightened the Ubuntu machine as well, because security work is not finished when one box looks clean. It is finished when the operator changes the way the whole workshop is run.

I also improved the hygiene of the agent workflow by identifying new issues during the launch push. The lesson was not “trust the agent harder”. It was the opposite: checkpoint better, stop earlier when context gets risky, preserve handoffs, and make sure the operating procedure survives the actual pressure of a launch week.

The PGX work was supposed to move faster. It did not, because the reset/encryption guidance is not up to date yet for the way this box needs to be used. That is annoying, but it is also exactly the kind of thing worth discovering before the PGX becomes part of the travelling demo setup. If the PGX is going to leave the building, encryption is not a nice-to-have. It is the baseline.

So the week’s value was not measured in commits. It was measured in fewer unknowns.

The price comparison

Using the audited 449.4M total accounted token workload, the per-token comparison looks like this:

Claude Opus 4.6 API: approximately £5,475 — about 267x the ChatGPT Pro weekly equivalent
Gemini 2.5 Pro API: approximately £1,233 — about 60x
Claude Sonnet API: approximately £1,085 — about 53x
GPT-5.3 Codex API: approximately £534 — about 26x
DeepSeek Chat API: approximately £99 — about 4.8x
GPT-4o mini API: approximately £58 — about 2.8x

These are estimates, not invoices. And this is one of the weeks where the flat subscription does not look spectacular on pure token maths.

That is fine. A workshop subscription is not only valuable on the week you max it out. Sometimes the value is having the capacity ready, then using it on judgement-heavy operations rather than raw code volume: travel, launch checks, hardware planning, security cleanup, and tool training.

Week-over-week comparison

Visible tokens (input + output): 122.5M → 43.0M, down 64.9%
Hermes sessions: 651 → 133, down 79.6%
Effective subscription rate: £0.028/M in Week 5 → £0.046/M on accounted tokens in Week 6
Constraint: Week 5 was cache-heavy creative and publishing work. Week 6 was travel, launch, security, hardware setup, Kate testing, and planning.

The wrong headline is “usage collapsed”.

The right headline is “the work changed”.

When the job is a blog launch, a malware cleanup, Ubuntu hardening, Kate guardrail testing, and making a portable PGX safe enough to travel, token burn is not the KPI. Trust is.

The stack

ChatGPT Pro: £89/month, about £20.54/week.
Hermes on Linux: local orchestration, reporting, machine checks, launch support, planning, and verification.
Lenovo PGX: ordered 11 May, arrived 12 May; future demo appliance and travelling AI box, with reset/encryption guidance now being turned into a practical guide.
Alex Finn video: useful external validation that this class of hardware is exactly where Hermes needs to go.
Kate/OpenClaw: boxed-builder testing for a small upcoming app, with Dade verification before anything touches live code.
Qwen 3.5 9B local: zero marginal cost utility worker.
Ubuntu hardening + Windows cleanup: not glamorous, but absolutely part of the AI workshop if the machines are going to be trusted.

A week like this is why I do not only track “how many tokens did I burn?” I also track what the tokens were for.

The bottom line

Week 6: 43.0M visible tokens, 406.4M cached tokens, 449.4M total accounted tokens, 133 Hermes sessions.

This was the week where the subscription mostly stayed in reserve because the actual job was operational: launch the blog, plan the next projects, clean the Windows machine, tighten Ubuntu, train Kate safely, and make sure the PGX can become a secure travelling demo box rather than an expensive liability with a nice badge.

The operator lesson is simple: unused capacity is not waste when the constraint is attention, trust, travel, security, or hardware readiness. The best token is sometimes the one you did not need to spend because the machine was already under control.

Found this useful?
👉 Follow Raf_VRS on X for more transparent AI insights that put you in control of your hardware.
👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency