Day 1: The Box Arrives
The Alienware box arrived: RTX 5070 Ti, 64GB RAM, Ubuntu 24, Ollama, and the first gap between “it runs” and “it runs well.”
The hardware
It started simply enough: a pre-built Alienware Aurora ACT1250 sitting on the desk. The specs looked solid on paper:
| Component | Spec |
|---|---|
| CPU | Intel Core Ultra 7 265KF |
| Memory | 64GB DDR5 |
| GPU | NVIDIA RTX 5070 Ti (16GB VRAM) |
| Storage (OS) | Stock NVMe (came with the machine) |
| Storage (test bench) | Samsung 990 Evo Plus 2TB PCIe |
| OS | Ubuntu 24.04 LTS |
The extra Samsung 990 Evo Plus was intentional -- a dedicated 2TB NVMe drive for unboxing and testing AI models, tools, and experiments without risking the main OS partition. When you're pulling 5-15GB model files and running destructive benchmarks, you want a scratch disk that doesn't share a filesystem with /home. It also means wiping and starting fresh is a 30-second operation, not an afternoon of backup anxiety.
The dream? Run production-grade AI locally and stop paying per-token to cloud providers. The machine itself was barely warm when I started asking the real question:
Can a single consumer GPU actually replace cloud AI?
Installing Ollama
First step was Ollama -- the easiest way to get local models running. One curl pipe, a few minutes, and I had a working LLM server on localhost:11434.
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull gemma4:e4b
The first time a model responded from your own hardware is a strange feeling. It's fast -- like really fast. No network round trip, no API key, no billing. Just you and the silicon.
The first reality check
But then things got complicated fast:
- Some models don't fit in 16GB VRAM -- you need to be strategic
- Model names are confusing (what even is
gemma4:e4b?) - There's no obvious way to compare models for your actual use case
- The default Ollama API has quirks (more on that in a later post)
The biggest realisation: "it responds" and "it responds well" are very different things. A model that takes 78 seconds to answer is technically working. It's also completely unusable for interactive workflows.
What I wanted
I wasn't building a chatbot. I wanted an AI agent -- something that could:
- Read files, run commands, edit code
- Make decisions about which model to use for which task
- Run automated cron jobs without burning tokens
- Stay secure and local by default
That last point became way more important than I expected. But that's the next post.
Found this useful? 👉 Follow @Raf_VRS for more Build Journal updates 👉 Support the work: ko-fi.com/rafvrs #SelfHosting #AIAgents #HardInterference