OpenRouter's :nitro, :floor, and :exacto — Same Model, Three Superpowers
OpenRouter suffixes let you bias the same model for speed, price, or routing precision with one small string change.
OpenRouter just shipped something quietly brilliant. Instead of forcing you to choose between speed, cost, and reliability — then lock that choice in — they gave us three-letter suffixes that swap the optimisation strategy while keeping the same model underneath.
Append :nitro for speed. :floor for the cheapest route. :exacto for precision. Same model. Different provider. One string change.
Here's the full breakdown.
The Three Suffixes
:nitro — Routes to the fastest provider. Highest throughput, lowest latency. When you're chatting with an agent and need snappy responses, this is your pick.
:floor — Routes to the cheapest provider. Sorted by price per token. Background tasks, bulk processing, scraping — anything where a few extra milliseconds don't matter but every penny does.
:exacto — Routes to providers with the best tool-calling reliability. When your agent is executing complex workflows, chaining API calls, or producing structured JSON, this minimises failure rates.
# Same model, three strategies
model: "google/gemma-4-31b:nitro" # Fastest responses
model: "google/gemma-4-31b:floor" # Cheapest provider
model: "google/gemma-4-31b:exacto" # Most reliable tool use
The suffix doesn't change the price — it changes which provider fulfils the request. Think of it as a routing instruction baked into the model ID.
Rate Limits: Free vs Paid
This is the bit most people miss.
Free-tier keys share a global pool across all free-tier users. That pool is capped at 10 requests per day. Hit the ceiling and you're waiting until tomorrow.
There is one useful middle ground: add about £8 of credit once (at the time of posting) and OpenRouter lifts the daily free-model limit to 1,000 requests per day. You can still route to free models, but you're no longer trapped behind the tiny starter allowance.
Paid keys get dedicated rate limits with no daily ceiling. Other users' traffic never touches your allowance. Your limits scale with your credits, not with how busy the free pool is.
| Free Tier | About £8 Credit Added | Paid Tier | |
|---|---|---|---|
| Rate limits | Shared pool | Higher free-model allowance | Dedicated to your key |
| Daily cap | 10 requests | 1,000 requests | No ceiling |
| Isolation | Other users affect you | Better headroom, still free-model routing | Nobody else's traffic touches yours |
If you're prototyping or benchmarking, free works. If you're running agents in production, paid isn't optional — it's the difference between "works sometimes" and "works always."
Every Free Model on OpenRouter (April 2026)
These models cost £0/M tokens — both input and output. With :floor, OpenRouter routes you to the zero-cost provider. With :nitro, it picks the fastest free option. With :exacto, it picks the most reliable free option.
- NVIDIA Nemotron 3 Super (120B/12B active, 262K ctx) — General reasoning, benchmarks
- Z.ai GLM 4.5 Air (MoE, 131K ctx) — Thinking mode + tool use
- OpenAI gpt-oss-120b (117B/5.1B active, 131K ctx) — Tool calling, structured output
- NVIDIA Nemotron Nano 30B (30B/3B active, 256K ctx) — Efficient agentic tasks
- MiniMax M2.5 (197K ctx) — Office tasks, SWE-Bench 80.2%
- NVIDIA Nemotron Nano 9B V2 (9B dense, 128K ctx) — Unified reasoning
- Google Gemma 4 31B (31B dense, 262K ctx) — Multimodal, 140+ languages
- NVIDIA Nemotron Nano 12B VL (12B, 128K ctx) — Vision, OCR, video
⚠️ Arcee Trinity Large is being retired April 22, 2026. Swap to Nemotron 3 Super or Gemma 4 31B before then.
The self-hoster's pick: Gemma 4 31B with :floor. Multimodal, 140+ languages, native function calling, and costs literally nothing.
Popular Paid Models and Pricing
When free models hit their ceiling (10 requests/day goes fast), here's what you upgrade to. Prices per 1M tokens.
- DeepSeek V3.2 — £0.21 in / £0.30 out — Budget coding, general reasoning
- Google Gemini 2.5 Flash — £0.12 in / £0.48 out — Speed + massive context on a budget
- Z.ai GLM 5.1 — £0.56 in / £3.52 out — Long-horizon coding (8hr+ autonomous)
- MoonshotAI Kimi K2.6 — £0.48 in / £2.24 out — Agent swarms
- Google Gemini 3.1 Pro — £1.60 in / £9.60 out — Premium reasoning
- OpenAI GPT-4o — £2.00 in / £8.00 out — Balanced general purpose
- Anthropic Claude 3.5 Sonnet — £2.40 in / £12.00 out — Coding + reasoning sweet spot
- xAI Grok 3 — £1.60 in / £6.40 out — Speed-focused coding
- Anthropic Claude Opus 4.6 Fast — £24.00 in / £120.00 out — Heavy-duty async agents
Each accepts :nitro, :floor, or :exacto — same base price, different provider routing.
When to Use Which Suffix
Use :floor when: You're on the free tier, running background tasks, batch jobs, or prototyping.
Use :nitro when: A human is waiting for a response, or latency is the bottleneck.
Use :exacto when: Tool-calling accuracy matters, failed requests cost more than slightly-slower reliable ones.
The VRS Stack
At Hard Interference, I run OpenRouter with a tiered model strategy:
# Background cron — cheapest possible
cron_model: "deepseek/deepseek-v3-0324:floor"
# Blog writing — reliable tool chains
writing_model: "nvidia/nemotron-3-super-120b-a12b:exacto"
# Real-time chat — fast responses
chat_model: "google/gemma-4-31b:nitro"
# Vision tasks — the free multimodal model
vision_model: "nvidia/nemotron-nano-12b-v2-vl:floor"
This runs primarily on free models with free-tier keys. The :floor suffix ensures I never accidentally route to a paid provider. The :exacto suffix on my writing pipeline means blog posts actually get finished without tool-call failures.
Start Free, Scale Smart
The variant system removes the biggest barrier to entry in AI: cost anxiety. You don't need to guess which model or provider is cheapest, fastest, or most reliable. OpenRouter handles routing. You just pick the suffix.
- Begin with
:flooron free models — prototype everything at zero cost - Switch to
:nitrowhen latency matters more than pennies - Reach for
:exactowhen reliability is non-negotiable - Upgrade to a paid key when 10 requests/day isn't enough
One string change. Three strategies. Start free, scale when you're ready.
Found this useful? 👉 Follow @Raf_VRS for more AI Guides updates 👉 Support independent AI: ko-fi.com/rafvrs #SelfHosting #AIAgents #HardInterference