Build Journal

Lessons So Far (And What's Next)

Five days in, six principles discovered. From benchmarking before committing, to the cardinal rule that simple checks never need LLM tokens. A summary of what I've learned and where I'm heading.

2026-04-15 · 3 min read

Five days, six principles

I have been running a local AI setup for less than a week and already the lessons are stacking up. Here are the ones that matter.

1. Benchmark before you commit

Don't choose models based on blog posts or benchmark leaderboards (even mine). Run your own tests on your own hardware with your own workloads. My RTX 5070 Ti made models behave very differently than the benchmarks suggested:

Your benchmarks. Your hardware. Your use case.

2. Automate health checks, but don't use LLMs for them

This is the cardinal rule now. A cron job that checks if a server is up should be a bash script run by system cron, not an LLM session that costs 22K tokens per invocation.

My healthcheck was burning 12M tokens/day. Replacing it with a plain crontab entry saved those tokens and reduced noise in the activity feed.

If it doesn't need reasoning, it doesn't need an LLM.

3. Make token usage visible

You cannot optimise what you cannot see. The Mission Control dashboard was the turning point -- once I could see that 75% of my daily tokens were going to cron jobs, the fix was obvious.

Before visibility: vague unease about costs. After visibility: specific targets for elimination.

4. Security first: local mode, file permissions, token redaction

The Discord token incident was the wake-up call. If your AI assistant can see your secrets, you need:

Build the security in, not bolt it on after the breach.

5. Route by task complexity, not habit

Sending every prompt to the most capable model is like using a sledgehammer for every nail. Smart routing based on task complexity means:

The thresholds (220 chars / 40 words) are rough, but they work. They catch 80% of the easy wins.

6. A £0 local model beats a £0 cloud model

When latency matters (and it always matters in interactive workflows), a local model running at 159 tokens/second with zero network latency beats a cloud model that takes 200ms just to establish the connection.

Cloud "free tiers" also have hidden costs: rate limits, token caps, and the constant risk that "free" becomes "not free." Your local hardware is already paid for. In addition some of the free models are still using your data to train on.

What's next

The journey continues. Current priorities:

  1. New app -- The real money-making project. The local AI setup exists to support this.
  2. Better benchmarking -- The benchmark runner needs to use /api/chat with streaming for accurate testing of thinking models.
  3. Tighter routing -- The character/word thresholds are blunt instruments. You need category-aware routing.
  4. Revenue -- The whole point. The local AI setup needs to pay for itself.

I found a few applications already that matter to me and this is where theory meets practice. If I can build, ship, and monetize an application using this local-first AI stack, I have proven the model works -- and the model pays for itself.


This journal is built with the same stack it documents: a Next.js app running on localhost, edited through a browser, powered by local models. Dogfooding is the best documentation.

Found this useful? 👉 Follow @Raf_VRS for more Build Journal updates 👉 Support the work: ko-fi.com/rafvrs