Build Journal

Weekly Usage Report — Week 6 (May 11–17): Visible Tokens vs Cached Context

Week 6: 43.0M visible tokens plus 406.4M cached tokens, for 449.4M total accounted Hermes tokens across 133 sessions.

2026-05-18 · 9 min read

Weekly AI Usage Report — Week 6: The Week the Tokens Stayed in the Tank

Reporting period: Monday 11 May – Sunday 17 May 2026
Previous week (Week 5): 730.8M total accounted tokens, 651 sessions, £20.54/week Pro equivalent
Subscription context: ChatGPT Pro at £89/month.

Token accounting

This report separates visible prompt/completion tokens from cached context. Visible tokens show fresh input/output work; cached tokens show repeated context reused during long agent sessions. Together, they show the full model-traffic footprint for the week.

This is the report where the quiet number is the honest number.

Week 6 looked quiet by visible input/output tokens. The cache-inclusive total shows the real footprint was larger: repeated context made up most of the model traffic.

I was on the road for most of it. When there was time, the priority was the Hard Interference blog launch, project planning, Lenovo PGX setup, and the future use of that box as a proper portable demo machine. The PGX was ordered on 11 May and arrived on 12 May, which turned the hardware plan from “research track” into “right, this thing is actually in the room now”.

There was also an Alex Finn video in the mix, and it did not create the idea so much as confirm the direction I was already moving in: a DGX Spark / PGX-style box is exactly the class of hardware Hermes needs if this is going to become a serious local agent workshop rather than a clever desktop experiment.

Then the Windows laptop’s AV caught a trojan warning, which meant the sensible work was not “build another feature”. It was PowerShell, cleanup, verification, and making sure there was not a single trace left behind.

Then I tightened the Ubuntu machine as well, because one warning shot is enough. Fun little hobby, modern computing. Very relaxing.

The PGX work also slowed for a boring but important reason: Lenovo’s reset, recovery, and encryption guidance for this setup is not up to date yet. That matters because this PGX is not just a desk ornament. It is meant to travel as a demo box. A portable AI appliance that goes in and out of meetings needs disk encryption before it gets treated as real kit. The guide will cover exactly that gap: the practical reset/encryption path, with observed steps separated from vendor assumptions.

So no, Week 6 was not token-heavy. It was operations-heavy.

The week in one picture

The headline version: 43.0M visible tokens, 406.4M cached tokens, and 449.4M total accounted Hermes tokens. The work was operational, but the repeated context footprint was still substantial.

Weekly Usage Report Week 6 — visible and cached token accounting

View full-size infographic

Top visible model routes

These route percentages describe visible input/output tokens only. Week 6 still had 406.4M cached-context tokens on top, which is why the full accounted total is much larger than the route list alone.

Daily breakdown

What actually happened this week

The main workload was not “write code until the fans scream”. It was keeping the whole operation moving while the environment changed around it.

The blog launch stayed the priority. That meant planning, review, tightening, localdemo work, and making sure the public-facing side of Hard Interference was not just technically correct, but credible. A launch week can burn a lot of judgement without burning many tokens.

The PGX became real during the week: ordered on Monday, arrived on Tuesday, then immediately folded into the bigger Hermes plan. The Alex Finn video review helped sharpen the point. The box is not interesting because it is shiny hardware. It is interesting because it fits the direction Hermes is already moving in: local agents, local context, local orchestration, and enough dedicated compute to stop treating serious agent work as a side quest on the main desktop.

Some of the week also went into training and testing Kate on a small upcoming app. I am not naming it here yet. The useful part is the operating pattern: Kate can be tested in a boxed workspace, Dade can verify what actually changed, and anything that smells like a hallucinated “done” claim gets caught before it touches the real app. That is not glamorous, but it is how agents become tools instead of chaos goblins with commit access.

The security side was more direct. The Windows AV caught a trojan warning. I treated that as an incident, not a shrug, with Dade walking me through the PowerShell checks. Startup and scheduled-task review, active-process checks, browser service-worker review, full browser cleanup, reboot, and McAfee rescan all came first. The final scan was clean. Then I tightened the Ubuntu machine as well, because security work is not finished when one box looks clean. It is finished when the operator changes the way the whole workshop is run.

I also improved the hygiene of the agent workflow by identifying new issues during the launch push. The lesson was not “trust the agent harder”. It was the opposite: checkpoint better, stop earlier when context gets risky, preserve handoffs, and make sure the operating procedure survives the actual pressure of a launch week.

The PGX work was supposed to move faster. It did not, because the reset/encryption guidance is not up to date yet for the way this box needs to be used. That is annoying, but it is also exactly the kind of thing worth discovering before the PGX becomes part of the travelling demo setup. If the PGX is going to leave the building, encryption is not a nice-to-have. It is the baseline.

So the week’s value was not measured in commits. It was measured in fewer unknowns.

The price comparison

Using the audited 449.4M total accounted token workload, the per-token comparison looks like this:

These are estimates, not invoices. And this is one of the weeks where the flat subscription does not look spectacular on pure token maths.

That is fine. A workshop subscription is not only valuable on the week you max it out. Sometimes the value is having the capacity ready, then using it on judgement-heavy operations rather than raw code volume: travel, launch checks, hardware planning, security cleanup, and tool training.

Week-over-week comparison

The wrong headline is “usage collapsed”.

The right headline is “the work changed”.

When the job is a blog launch, a malware cleanup, Ubuntu hardening, Kate guardrail testing, and making a portable PGX safe enough to travel, token burn is not the KPI. Trust is.

The stack

A week like this is why I do not only track “how many tokens did I burn?” I also track what the tokens were for.

The bottom line

Week 6: 43.0M visible tokens, 406.4M cached tokens, 449.4M total accounted tokens, 133 Hermes sessions.

This was the week where the subscription mostly stayed in reserve because the actual job was operational: launch the blog, plan the next projects, clean the Windows machine, tighten Ubuntu, train Kate safely, and make sure the PGX can become a secure travelling demo box rather than an expensive liability with a nice badge.

The operator lesson is simple: unused capacity is not waste when the constraint is attention, trust, travel, security, or hardware readiness. The best token is sometimes the one you did not need to spend because the machine was already under control.

Found this useful?
👉 Follow Raf_VRS on X for more transparent AI insights that put you in control of your hardware.
👉 Support the work: ko-fi.com/rafvrs

#VRSComputing #ModelBenchmarking #TokenUsage #AIAgents #CostTransparency