AI Guides

OpenRouter's :nitro, :floor, and :exacto — Same Model, Three Superpowers

OpenRouter suffixes let you bias the same model for speed, price, or routing precision with one small string change.

2026-04-21 · 5 min read

OpenRouter just shipped something quietly brilliant. Instead of forcing you to choose between speed, cost, and reliability — then lock that choice in — they gave us three-letter suffixes that swap the optimisation strategy while keeping the same model underneath.

Append :nitro for speed. :floor for the cheapest route. :exacto for precision. Same model. Different provider. One string change.

Here's the full breakdown.

The Three Suffixes

:nitro — Routes to the fastest provider. Highest throughput, lowest latency. When you're chatting with an agent and need snappy responses, this is your pick.

:floor — Routes to the cheapest provider. Sorted by price per token. Background tasks, bulk processing, scraping — anything where a few extra milliseconds don't matter but every penny does.

:exacto — Routes to providers with the best tool-calling reliability. When your agent is executing complex workflows, chaining API calls, or producing structured JSON, this minimises failure rates.

# Same model, three strategies
model: "google/gemma-4-31b:nitro"    # Fastest responses
model: "google/gemma-4-31b:floor"    # Cheapest provider
model: "google/gemma-4-31b:exacto"   # Most reliable tool use

The suffix doesn't change the price — it changes which provider fulfils the request. Think of it as a routing instruction baked into the model ID.

Rate Limits: Free vs Paid

This is the bit most people miss.

Free-tier keys share a global pool across all free-tier users. That pool is capped at 10 requests per day. Hit the ceiling and you're waiting until tomorrow.

There is one useful middle ground: add about £8 of credit once (at the time of posting) and OpenRouter lifts the daily free-model limit to 1,000 requests per day. You can still route to free models, but you're no longer trapped behind the tiny starter allowance.

Paid keys get dedicated rate limits with no daily ceiling. Other users' traffic never touches your allowance. Your limits scale with your credits, not with how busy the free pool is.

Free TierAbout £8 Credit AddedPaid Tier
Rate limitsShared poolHigher free-model allowanceDedicated to your key
Daily cap10 requests1,000 requestsNo ceiling
IsolationOther users affect youBetter headroom, still free-model routingNobody else's traffic touches yours

If you're prototyping or benchmarking, free works. If you're running agents in production, paid isn't optional — it's the difference between "works sometimes" and "works always."

Every Free Model on OpenRouter (April 2026)

These models cost £0/M tokens — both input and output. With :floor, OpenRouter routes you to the zero-cost provider. With :nitro, it picks the fastest free option. With :exacto, it picks the most reliable free option.

⚠️ Arcee Trinity Large is being retired April 22, 2026. Swap to Nemotron 3 Super or Gemma 4 31B before then.

The self-hoster's pick: Gemma 4 31B with :floor. Multimodal, 140+ languages, native function calling, and costs literally nothing.

Popular Paid Models and Pricing

When free models hit their ceiling (10 requests/day goes fast), here's what you upgrade to. Prices per 1M tokens.

Each accepts :nitro, :floor, or :exacto — same base price, different provider routing.

When to Use Which Suffix

Use :floor when: You're on the free tier, running background tasks, batch jobs, or prototyping.

Use :nitro when: A human is waiting for a response, or latency is the bottleneck.

Use :exacto when: Tool-calling accuracy matters, failed requests cost more than slightly-slower reliable ones.

The VRS Stack

At Hard Interference, I run OpenRouter with a tiered model strategy:

# Background cron — cheapest possible
cron_model: "deepseek/deepseek-v3-0324:floor"

# Blog writing — reliable tool chains
writing_model: "nvidia/nemotron-3-super-120b-a12b:exacto"

# Real-time chat — fast responses
chat_model: "google/gemma-4-31b:nitro"

# Vision tasks — the free multimodal model
vision_model: "nvidia/nemotron-nano-12b-v2-vl:floor"

This runs primarily on free models with free-tier keys. The :floor suffix ensures I never accidentally route to a paid provider. The :exacto suffix on my writing pipeline means blog posts actually get finished without tool-call failures.

Start Free, Scale Smart

The variant system removes the biggest barrier to entry in AI: cost anxiety. You don't need to guess which model or provider is cheapest, fastest, or most reliable. OpenRouter handles routing. You just pick the suffix.

  1. Begin with :floor on free models — prototype everything at zero cost
  2. Switch to :nitro when latency matters more than pennies
  3. Reach for :exacto when reliability is non-negotiable
  4. Upgrade to a paid key when 10 requests/day isn't enough

One string change. Three strategies. Start free, scale when you're ready.


Found this useful? 👉 Follow @Raf_VRS for more AI Guides updates 👉 Support independent AI: ko-fi.com/rafvrs #SelfHosting #AIAgents #HardInterference