AI Guides

OpenRouter's :nitro, :floor, and :exacto — Same Model, Three Superpowers

OpenRouter suffixes let you bias the same model for speed, price, or routing precision with one small string change.

2026-04-21 · 5 min read

OpenRouter just shipped something quietly brilliant. Instead of forcing you to choose between speed, cost, and reliability — then lock that choice in — they gave us three-letter suffixes that swap the optimisation strategy while keeping the same model underneath.

Append :nitro for speed. :floor for the cheapest route. :exacto for precision. Same model. Different provider. One string change.

Here's the full breakdown.

The Three Suffixes

:nitro — Routes to the fastest provider. Highest throughput, lowest latency. When you're chatting with an agent and need snappy responses, this is your pick.

:floor — Routes to the cheapest provider. Sorted by price per token. Background tasks, bulk processing, scraping — anything where a few extra milliseconds don't matter but every penny does.

:exacto — Routes to providers with the best tool-calling reliability. When your agent is executing complex workflows, chaining API calls, or producing structured JSON, this minimises failure rates.

# Same model, three strategies
model: "google/gemma-4-31b:nitro"    # Fastest responses
model: "google/gemma-4-31b:floor"    # Cheapest provider
model: "google/gemma-4-31b:exacto"   # Most reliable tool use

The suffix doesn't change the price — it changes which provider fulfils the request. Think of it as a routing instruction baked into the model ID.

Rate Limits: Free vs Paid

This is the bit most people miss.

Free-tier keys share a global pool across all free-tier users. That pool is capped at 10 requests per day. Hit the ceiling and you're waiting until tomorrow.

There is one useful middle ground: add about £8 of credit once (at the time of posting) and OpenRouter lifts the daily free-model limit to 1,000 requests per day. You can still route to free models, but you're no longer trapped behind the tiny starter allowance.

Paid keys get dedicated rate limits with no daily ceiling. Other users' traffic never touches your allowance. Your limits scale with your credits, not with how busy the free pool is.

	Free Tier	About £8 Credit Added	Paid Tier
Rate limits	Shared pool	Higher free-model allowance	Dedicated to your key
Daily cap	10 requests	1,000 requests	No ceiling
Isolation	Other users affect you	Better headroom, still free-model routing	Nobody else's traffic touches yours

If you're prototyping or benchmarking, free works. If you're running agents in production, paid isn't optional — it's the difference between "works sometimes" and "works always."

Every Free Model on OpenRouter (April 2026)

These models cost £0/M tokens — both input and output. With :floor, OpenRouter routes you to the zero-cost provider. With :nitro, it picks the fastest free option. With :exacto, it picks the most reliable free option.

NVIDIA Nemotron 3 Super (120B/12B active, 262K ctx) — General reasoning, benchmarks
Z.ai GLM 4.5 Air (MoE, 131K ctx) — Thinking mode + tool use
OpenAI gpt-oss-120b (117B/5.1B active, 131K ctx) — Tool calling, structured output
NVIDIA Nemotron Nano 30B (30B/3B active, 256K ctx) — Efficient agentic tasks
MiniMax M2.5 (197K ctx) — Office tasks, SWE-Bench 80.2%
NVIDIA Nemotron Nano 9B V2 (9B dense, 128K ctx) — Unified reasoning
Google Gemma 4 31B (31B dense, 262K ctx) — Multimodal, 140+ languages
NVIDIA Nemotron Nano 12B VL (12B, 128K ctx) — Vision, OCR, video

⚠️ Arcee Trinity Large is being retired April 22, 2026. Swap to Nemotron 3 Super or Gemma 4 31B before then.

The self-hoster's pick: Gemma 4 31B with :floor. Multimodal, 140+ languages, native function calling, and costs literally nothing.

Popular Paid Models and Pricing

When free models hit their ceiling (10 requests/day goes fast), here's what you upgrade to. Prices per 1M tokens.

DeepSeek V3.2 — £0.21 in / £0.30 out — Budget coding, general reasoning
Google Gemini 2.5 Flash — £0.12 in / £0.48 out — Speed + massive context on a budget
Z.ai GLM 5.1 — £0.56 in / £3.52 out — Long-horizon coding (8hr+ autonomous)
MoonshotAI Kimi K2.6 — £0.48 in / £2.24 out — Agent swarms
Google Gemini 3.1 Pro — £1.60 in / £9.60 out — Premium reasoning
OpenAI GPT-4o — £2.00 in / £8.00 out — Balanced general purpose
Anthropic Claude 3.5 Sonnet — £2.40 in / £12.00 out — Coding + reasoning sweet spot
xAI Grok 3 — £1.60 in / £6.40 out — Speed-focused coding
Anthropic Claude Opus 4.6 Fast — £24.00 in / £120.00 out — Heavy-duty async agents

Each accepts :nitro, :floor, or :exacto — same base price, different provider routing.

When to Use Which Suffix

Use :floor when: You're on the free tier, running background tasks, batch jobs, or prototyping.

Use :nitro when: A human is waiting for a response, or latency is the bottleneck.

Use :exacto when: Tool-calling accuracy matters, failed requests cost more than slightly-slower reliable ones.

The VRS Stack

At Hard Interference, I run OpenRouter with a tiered model strategy:

# Background cron — cheapest possible
cron_model: "deepseek/deepseek-v3-0324:floor"

# Blog writing — reliable tool chains
writing_model: "nvidia/nemotron-3-super-120b-a12b:exacto"

# Real-time chat — fast responses
chat_model: "google/gemma-4-31b:nitro"

# Vision tasks — the free multimodal model
vision_model: "nvidia/nemotron-nano-12b-v2-vl:floor"

This runs primarily on free models with free-tier keys. The :floor suffix ensures I never accidentally route to a paid provider. The :exacto suffix on my writing pipeline means blog posts actually get finished without tool-call failures.

Start Free, Scale Smart

The variant system removes the biggest barrier to entry in AI: cost anxiety. You don't need to guess which model or provider is cheapest, fastest, or most reliable. OpenRouter handles routing. You just pick the suffix.

Begin with :floor on free models — prototype everything at zero cost
Switch to :nitro when latency matters more than pennies
Reach for :exacto when reliability is non-negotiable
Upgrade to a paid key when 10 requests/day isn't enough

One string change. Three strategies. Start free, scale when you're ready.

Found this useful? 👉 Follow @Raf_VRS for more AI Guides updates 👉 Support independent AI: ko-fi.com/rafvrs #SelfHosting #AIAgents #HardInterference