I Built a Tool to Compare LLM API Prices — Here's Why

Every time I started a new AI-powered project, I found myself doing the same tedious ritual: opening the OpenAI pricing page, then Anthropic’s, then Google’s, then Mistral’s — all to answer one simple question: which model gives me the best value for my specific use case?

After doing this three times in a month, I decided to solve it once and for all.

The Problem: Pricing Is Scattered and Confusing

LLM pricing isn’t just about cost per token. There are at least six dimensions you need to consider before picking a model:

Input price (per million tokens)
Output price — often 3–10× more expensive than input
Context window — how much text the model can see at once
TTFT (Time to First Token) — how fast it starts responding
TPS (Tokens per Second) — how fast it generates output
Capabilities — vision, function calling, reasoning, streaming, code

No single provider makes it easy to compare all of this at once. That’s by design — but it’s frustrating when you just want to make a decision and start building.

What I Built

LLM API Comparator is a free, client-side tool that lets you:

Browse 25+ models from OpenAI, Anthropic, Google, Meta, Mistral, and Cohere — including self-hosted options like Llama 3 and Mistral 7B via Ollama.
Filter by provider or capabilities — quickly narrow down to models that support vision, function calling, long context, or streaming.
Compare side by side — select any combination of models and see their specs next to each other instantly.
Calculate real costs — use the built-in calculator to estimate what your actual workload would cost across different models.
Find the best match — a guided flow that asks you what matters most and surfaces the top options.

Everything runs in the browser. No sign-up, no API key, no tracking.

The Stack

I built it with Astro + React + Tailwind, the same combination I’ve been reaching for on most projects lately. Astro handles the static shell and routing, React powers the interactive comparison grid and calculator, and Tailwind keeps the styling fast and consistent.

The model data lives in a flat TypeScript file — easy to update when providers change their pricing (which happens surprisingly often).

What I Learned Building It

Pricing changes fast. In the few weeks I was building this, Anthropic updated Claude pricing twice and Google dropped Gemini 1.5 Flash’s cost significantly. I built in an “updated” timestamp so users know how fresh the data is.

Output tokens are underestimated. Most developers focus on input price, but in chat applications, output dominates cost. A 500-token response at GPT-4 Turbo prices costs more than the entire input for most prompts. The calculator makes this visible.

Self-hosted is genuinely free — with caveats. Llama 3 and Mistral 7B via Ollama have zero per-token cost, but the hardware and operational overhead are real. I included them in the comparator because “free API” vs “infrastructure cost” is a legitimate trade-off that depends on volume.

People don’t know what TTFT means. I added an FAQ section with plain-English explanations of tokens, TTFT, TPS, RAG, fine-tuning, and streaming — not because I thought it was necessary, but because every person I showed the tool to asked at least one of those questions.

The FAQ I Wish Existed

While building the comparator, I ended up writing more than I expected for the FAQ section. It covers questions like:

When should I use a cloud API vs self-host?
What’s the difference between fine-tuning and RAG?
How does a complete LLM API call work, from request to streaming response?
Should I always use streaming? (Yes, for user-facing UIs.)

The goal was to make the tool useful even if you’re just starting to explore LLM APIs — not just for people who already know the landscape.

Try It

If you’re building anything with LLM APIs and you’re not sure which model to use, give it a try. It takes about 30 seconds to narrow down your options.

The source is on GitHub if you want to contribute or just poke around.