Build Journal

First-person build notes from the Hard Interference local AI workshop.

Weekly Usage Report — Week 7 (May 18–24): Visible Tokens vs Cached Context
Week 7: 28.8M visible input/output tokens plus 343.7M cached tokens, for 372.5M total accounted Hermes tokens across 70 sessions.
Weekly Usage Report — Week 6 (May 11–17): Visible Tokens vs Cached Context
Week 6: 43.0M visible tokens plus 406.4M cached tokens, for 449.4M total accounted Hermes tokens across 133 sessions.
Weekly Usage Report — Week 5 (May 4–10): 731 Million Accounted Tokens for £20.54
Week 5: 122.5M visible tokens plus 608.3M cached tokens, for 730.8M total accounted Hermes tokens across 651 sessions.
Weekly Usage Report — Week 4 (Apr 27–May 3): 495 Million Accounted Tokens for £9.24
Week 4: 374.2M visible tokens plus 120.6M cached tokens, for 494.8M total accounted Hermes tokens across 2,461 sessions. Opus-equivalent API cost: about £6,213.
The ChatGPT Subscription Trap: Stuck Between Tiers With 1.1 Billion Tokens
I am burning through tokens faster than any single ChatGPT plan was designed for, but I have not made a penny from this yet. The subscription math for multi-agent orchestration does not add up, and I am in the gap between tiers with no clear exit.
SQLite WAL Bloat in Hermes: What It Is and How I Vacuumed It Safely
Hermes session storage ballooned to 574MB after 4,000+ sessions. The WAL file was one problem, but the real culprit was a redundant FTS trigram index eating half the database. Here is what I found, how I diagnosed it, and the safe cleanup that shaved 267MB with zero data loss.
Why I Fired GLM-5.1 From Deployment
GLM-5.1 was my daily driver until context amnesia, character leaks, blind-spot failures, and deployment breakage turned routine work into repeated recovery ops. Here’s the build-journal story of why Dade (DeepSeek V4 Pro) took over coding and deployment.
NEVER F**KING GUESS: 9 Seconds to Destroy a Production Database
Cursor running Claude Opus 4.6 wiped a SaaS production database and volume-level backups in nine seconds. This wasn’t an AI ‘oops’ — it was a missing-guardrails failure. Here’s what happened, why it matters, and how I design systems so it can’t happen here.
Weekly Usage Report — Week 3 (Apr 20–26): 530 Million Accounted Tokens for £9.24
Week 3: 449.3M visible tokens plus 80.8M cached tokens, for 530.1M total accounted Hermes tokens across 2,288 sessions. Opus-equivalent API cost: about £6,543.
How ChatGPT Images 2.0 Finally Got Our Logo Right (After 50+ Failed Attempts)
After 50+ failed attempts across Stable Diffusion, Flux, and Claude, ChatGPT Images 2.0 nailed the VRS logo in just 8 prompts — then reverse-engineered prompts for every other model.
Stop Running in Circles — How I Made AI Memory Actually Useful
I rebuilt AI memory as a three-level dashboard: day, summary, then full conversation, so useful context stopped hiding in a search box.
Build Journal: The Story Behind the Stack
Not tutorials — stories. The wow moments, the crashes, and the 2am realisations that come from actually building with AI instead of just reading about it.
The IMDB Deduction: When Your AI Impresses You
The moment a bare IMDB link revealed my AI agent's true understanding — connecting dots no human would bother to connect.
Weekly Usage Report — Week 2 (Apr 13–19): 371 Million Accounted Tokens for £9.24
Week 2: 325.9M visible tokens plus 45.5M cached tokens, for 371.4M total accounted Hermes tokens across 1,078 sessions. Opus-equivalent API cost: about £4,542.
When Your AI Stack Eats Itself: The Ollama Crash Loop That Took Everything Down
Two Ollama services, 56,000 restart attempts, and one port — how a silent systemd conflict took down my entire local AI stack, why it can happen to you, and how to prevent it.
The Other Way Your AI Agent Dies: Iteration Budget Exhaustion
Your AI agent stops mid-task. You assume it is a bug. It is usually not — it has burned through its tool-calling budget. Every read, shell command, patch, browser check, and retry costs a turn. Here is how to spot the failure mode, recover cleanly, and design tasks that do not waste iterations.
Just One More Prompt
I generated a full rap-over-house track on a local RTX 5070 Ti using HeartMuLa — and lived to tell the tale of dependency hell, patching transformers, and the moment the beat finally dropped.
When Memory Becomes the Problem
My AI agent's memory hit 21.1K chars in a 16K limit. It wasn't a bug — it was a design flaw. Here's how persistent memory bloat creeps up on AI agents, why compression alone can't save you, what I did to fix it, and where external memory providers fit into the architecture.
Are You Still Working? How I Made AI Agent Status Visible
My AI agent got stuck twice in two sessions -- once from context loss, once from a stale process conflict. I couldn't see it happen because the dashboard only knew the agent existed, not what it was doing. Here's how I built 4-state status detection and why it matters.
The 12 Million Token Mistake
A healthcheck cron job running every 5 minutes through an LLM session was burning 12 million tokens per day to execute 'curl localhost:3000'. The most expensive 3-line bash script ever written.
When Your AI Forgets What It Did
My AI agent built an entire blog website, then forgot it existed. Context windows fill up, sessions die, and work gets lost. Here's how I made my setup resilient to the most fundamental AI problem: amnesia.
Lessons So Far (And What's Next)
Five days in, six principles discovered. From benchmarking before committing, to the cardinal rule that simple checks never need LLM tokens. A summary of what I've learned and where I'm heading.
Building Mission Control (Or: How I Learned to Stop Worrying and Love the Dashboard)
How Mission Control turned token drain into visible numbers, exposed cost bugs, and proved you cannot cut what you cannot see.
Weekly Usage Report — Week 1 (Apr 6–12): 97 Million Accounted Tokens for £9.24
Week 1: 51.8M visible tokens plus 45.6M cached tokens, for 97.4M total accounted Hermes tokens across 88 sessions. Opus-equivalent API cost: about £1,188.
Day 1: The Box Arrives
The Alienware box arrived: RTX 5070 Ti, 64GB RAM, Ubuntu 24, Ollama, and the first gap between “it runs” and “it runs well.”