Shop
VERTUVERTU

GPT-5.6 Sol vs Terra vs Luna (2026): Pricing & Math Breakdown

[_AI_TOOLS_]

> date: PUBLISHED ON JUN 30, 2026> decoder: VERTU SIGNALS

GPT-5.6 Sol vs Terra vs Luna (2026): Pricing & Math Breakdown

Why it matters

GPT-5.6 launched as three SKUs — Sol ($5/$30), Terra ($2.50/$15), Luna ($1/$6) per million tokens. Here is who should buy which tier, where the per-token math breaks, and what the Sol Ultra multiplier may look like.

Current status as of June 30, 2026: OpenAI released GPT-5.6 on June 26, 2026 in three SKUs — Sol, Terra, and Luna — each with separate per-million-token pricing. Sol is $5.00 input / $30.00 output, Terra is $2.50 / $15, and Luna is $1.00 / $6, per OpenAI's help-center pricing page. Cached input runs at roughly 10% of standard input across all three tiers. A higher-compute Sol Ultra mode is available as an API parameter; OpenAI has not published a separate Sol Ultra price as of June 30, 2026.

Quick facts

What happened

OpenAI shipped GPT-5.6 on June 26, 2026 as a three-SKU lineup rather than a single model. The launch materials — confirmed in OpenAI's help-center pricing page, the official GPT-5.6 Sol announcement, and a developer walkthrough from DataCamp — position each tier against a different workload class.

For an engineering lead, a platform owner, or a procurement team trying to figure out what to put on the next PO, the headline question is no longer "should we use GPT-5.6?" It is "which of the three SKUs goes against which workload, and where does the per-token math actually break?"

This guide walks through the three tiers side by side, the workload patterns each one is built for, the cost cliffs that are easy to miss, and the specific combinations that are quietly cheaper than the obvious one. All numbers are cross-checked against the OpenAI help center pricing page, the Lush Binary developer guide, and OpenAI's official deployment safety report. Last updated June 30, 2026.

The three tiers at a glance

The cached-input rate is roughly 10% of standard input across all three tiers — OpenAI's prompt caching documentation walks through the write/read mechanics, and the launch materials confirm the rate is held constant within the GPT-5.6 family.

The price ratio between tiers is roughly 5:2.5:1 on input and 5:2.5:1 on output. That ratio matters more than any individual price, because it tells you where the family is positioning itself: Sol is *not* "the premium tier." It is the tier that exists for workloads the cheaper ones cannot do. The implication — and this is the part that gets missed in the launch coverage — is that Terra and Luna are the tiers most teams will run the majority of their traffic on.

Sol: the flagship, not the one you run on every request

Sol is the tier most of the launch headlines are about. It is the model that scored 88.8% on Terminal-Bench 2.1 and a higher-compute Sol Ultra mode that pushes to 91.9% on the same benchmark. It is also the tier that performs on par with Claude Mythos 5 on ExploitBench while using roughly one-third the output tokens.

The price ($5.00 input, $30.00 output) is the headline cost. The cost you actually care about is the per-task cost when an agent runs a multi-step loop. Three workload patterns are appropriate for Sol:

  1. Multi-hour agentic coding. The kind of trace where the model writes a plan, executes it, hits a failure, recovers, and iterates. Long output, long reasoning, and the failure rate dominates the budget.
  2. Cybersecurity automation. ExploitBench-style loops, red-team generation, vulnerability analysis pipelines. Per the OpenAI deployment safety report, Sol is the safest model in the GPT-5.x line on adversarial prompts.
  3. Long-horizon reasoning with a high tool-call density. Tasks that pull from 20+ tools across multiple domains and need to keep state coherent across hundreds of turns.

What Sol is *not* for: chat, RAG, classification, extraction, summarization, and any workload where the answer is short and the input is large. For those, the per-token premium pays for capability you don't use.

The Sol Ultra mode (compute-intensive, no published price as of June 30, 2026) is a separate setting inside the Sol API tier. It is the *only* mode that hits the 91.9% Terminal-Bench number. If you are evaluating Sol, you need to know which mode you are testing — Ultra scores are not directly comparable to standard Sol scores, and the price delta may be material. The planning assumption, based on the parallel sub-agent pattern and the published Sol pricing, is a 1.5–2x multiplier on both input and output.

Terra: the workhorse you will actually run

Terra is the tier OpenAI is positioning as the direct replacement for GPT-5.5 at approximately half the cost. The input/output rates ($2.50 / $15.00) put it at the same effective price point as the current Claude Sonnet 5 and the Gemini 3.5 Pro flash tier.

Terra is built for:

  1. Production chat and RAG. The kind of internal Q&A, doc summarization, and helpdesk-routing work that runs at 10x–100x the volume of agentic coding.
  2. Mid-complexity code generation. Unit tests, scaffolding, refactors with a human in the loop, and "explain this function" workflows. Not full agentic loops.
  3. Structured extraction at scale. JSON-mode, function-calling, batch jobs that turn unstructured text into structured records. Terra matches GPT-5.5 on the standard extraction evals per Lush Binary's developer guide.

The honest framing: Terra is the tier that lets a team consolidate from "primary + cheap router" to "primary only." If you have been running GPT-5.5 (or Claude Sonnet 5) as your main model and a small cheap model as your router/extractor, Terra collapses that two-model stack into one. The ops surface gets smaller. The eval pipeline gets smaller.

The Terra-vs-Sol choice is *not* a quality question for most workloads. It is a *fit* question. Run your own prompts through both before deciding; the per-task cost difference is large enough that a wrong choice shows up in the next month's invoice.

Luna: the speed-first router

Luna is the new bottom of the lineup, priced at $1.00 input and $6.00 output per million tokens. The positioning per OpenAI's help center is "summarization, drafting, and routine automation." In practice, Luna is the tier that replaces the "mini" / "flash" / "haiku" pattern most teams have been running as their secondary model.

Workloads that belong on Luna:

  1. Classification and routing. "Is this email a complaint, a question, or a request for a refund?" — fast, cheap, no reasoning depth required.
  2. Pre-LLM cleanup. Tokenization, normalization, redaction of personally identifying details before the prompt is sent to a larger model.
  3. High-volume batch jobs. Nightly extractors, mass-translation pipelines, log-triage automation. Volume is the constraint, not quality.
  4. Real-time low-latency flows. Anything that needs to return in under 200ms.

The cost math for Luna is brutal at the high end. Six dollars per million output tokens is *cheap* for what it is, but it is still meaningfully more than a self-hosted open model. If you are running Luna at multi-billion-tokens-per-day volume, the next conversation is not "should we use more Luna?" — it is "should we self-host?"

The cost cliffs most teams miss

Three pricing patterns in the GPT-5.6 lineup are easy to walk off a cliff on:

1. Caching is not free. Prompt caching on the GPT-5.6 family is roughly 10% of standard input on reads. Cache *writes* are billed at 1.25x the standard input rate, per the prompt caching documentation. The math is favorable when your prompt is reused frequently (RAG, long system prompts, multi-turn chat). It is unfavorable when each request has a unique prompt — a per-request cache write *adds* cost versus a no-cache call.

2. Output tokens dominate agent loops. Sol output is $30 per million. An agent loop that produces 50,000 tokens of output per task is $1.50 of pure output cost before you count input. The cheapest way to make agentic coding economic is not to negotiate a lower rate — it is to architect the loop so the model writes less. Trimming the agent trace from 50k to 20k output tokens saves more than switching to Terra.

3. Sol Ultra is not a free upgrade. Sol Ultra is a higher-compute mode inside the Sol API tier that pushes Terminal-Bench 2.1 from 88.8% to 91.9%. OpenAI has not published a separate price for Ultra as of the June 30, 2026 launch, but the pattern across the industry is that higher-compute modes are billed at a multiplier. If your team standardizes on Sol Ultra without budgeting for the multiplier, the first invoice will be a surprise.

How to mix them in a real production system

A common 2026 pattern, drawn from the OpenAI Apps SDK guidance and the MCP server concepts, is a three-tier split that mirrors the lineup:

  • Luna handles the entry classification and the routing decision ("this is a coding task, route to Sol; this is a chat task, route to Terra").
  • Terra handles the production chat, RAG, extraction, and mid-complexity code work that constitutes the bulk of traffic.
  • Sol (and Sol Ultra for the hardest cases) is invoked for the long-horizon agentic tasks that are too complex for Terra but represent a small fraction of total traffic.

The wins from this pattern are not from using Sol less. They are from using Luna and Terra more, and from routing the requests that *should* go to a frontier model to one. The mistake is the opposite: treating all three tiers as interchangeable and defaulting to Sol out of habit.

A second, less obvious pattern: use Sol for the planning step, Luna for the execution step. The planner generates a structured plan in Sol. The executor runs each step in Luna. The plan only needs to be regenerated when the task changes; the execution runs thousands of times. The per-task cost is dominated by the executor's tier, not the planner's.

What changed since the last update

Compared to the GPT-5.5 family:

  • Tier structure. Three explicit SKUs replace the previous "main + cheap router" pattern.
  • Pricing model. Sol / Terra / Luna is roughly 5x / 2.5x / 1x on output, anchored to GPT-5.5-era pricing.
  • Capability lead. Sol (88.8%) and Sol Ultra (91.9%) extend the agentic-coding lead on Terminal-Bench 2.1.
  • Availability. The June 26 launch is gated to a small trusted-partner preview pending the same federal cybersecurity review pattern that began with Anthropic Mythos 5 on June 12.

What it means

For most enterprise IT teams, the GPT-5.6 lineup is a routing question: which tier for which workload, and how do you instrument the per-task cost math so the wrong choice shows up before the invoice.

For procurement teams operating in regulated verticals (finance, healthcare, defense-adjacent), the procurement memo may now want to model three tracks: cloud model (federal-gated), local model (no gating), and a wait-and-see track for the public rollout. The Mythos 5 / GPT-5.6 pattern is a useful planning signal for 2026, not yet an established baseline.

FAQ

Is GPT-5.6 Sol available to the public? No. As of June 30, 2026, access is limited to a small trusted-partner preview. OpenAI has stated the public rollout is "in the coming weeks," with no specific date.

What does Sol Ultra cost? OpenAI has not published a separate Sol Ultra price as of June 30, 2026. Based on the parallel sub-agent pattern and industry norms, a 1.5–2x multiplier on both input and output is the planning assumption. Confirm the contract line before authorizing Sol Ultra in production.

Should my team default to Sol? For most workloads, no. Sol is the tier for workloads Terra and Luna cannot do. Defaulting to Sol for chat, RAG, classification, and routing is the most expensive mistake a team can make on this lineup.

Is GPT-5.6 Sol Ultra worth the multiplier? For workloads where 91.9% reliability on Terminal-Bench 2.1 is mandatory (cybersecurity automation, regulated-document review, multi-step planning reused thousands of times), yes. For most production workloads, standard Sol at 88.8% is the right answer.

What about prompt caching? Cached input is roughly 10% of standard input on reads, with cache writes billed at 1.25x. Workloads with high cache reuse (RAG, long system prompts, multi-turn chat) see meaningful reductions on the input side. Workloads with low cache reuse get less benefit.

Sources checked

For a Different Kind of Audience

If your workload is "summarize a doc, draft an email, route a ticket," the GPT-5.6 cloud lineup is the right tool and the tier-mix math will keep your invoice sane. If your workload is "an attorney-client conversation that cannot leave the room, a board memo that cannot be logged, or sensitive personal data that should never be aggregated to a federal review queue," a different kind of device exists: see luxury phones with on-device AI assistants for hardware designed for more local, private workflows.

More In AI Tools