Anthropic's Sonnet 5 Costs Less, Thinks Like Opus—And Nags Its Owners About Rules 🧠

Anthropic released Claude Sonnet 5 on Tuesday, describing it as "the most agentic Sonnet model yet" and positioning it as a near-Opus tier at Sonnet pricing rather than a step down from the previous flagship. In its launch post, the company states Sonnet 5's performance is "close to that of Opus 4.8, but at lower prices," and the model is now the default for Free and Pro users across Max, Team, and Enterprise plans, in Claude Code, and via the API. Developers can adjust an effort dial to blend between Sonnet 5 and Opus 4.8, or select different levels in the web app, allowing them to trade cost for accuracy on the same task at rates below typical Opus charges.

Benchmark scores cited by Anthropic place Sonnet 5 within statistical distance of Opus 4.8 across several evaluations. On SWE-bench Pro, a coding benchmark drawing from actively maintained repositories with multi-file changes and scored as percent solved, Sonnet 5 reached 63.2% versus Sonnet 4.6's 58.1%. On GDPval-AA v2, an Artificial Analysis benchmark scoring real-world professional tasks across 44 jobs via blind pairwise Elo ratings, Sonnet 5 hit 1,618 against Opus 4.8's 1,616. On Humanity's Last Exam, the gap narrowed further, with Sonnet 5 at 57.4% and Opus 4.8 at 57.9%.

Pricing has been calibrated to absorb a tokenizer change that increases token consumption on similar inputs. "Sonnet 5 is an upgrade to Sonnet 4.6, but it uses an updated tokenizer that changes how the model processes text to improve performance," Anthropic wrote in a footnote. "The tradeoff is that the same input can map to more tokens: roughly 1.0–1.35× depending on the content type." The company set an introductory rate of $2 per million input tokens and $10 per million output tokens to keep the shift close to cost-neutral through August 31, after which pricing reverts to the standard $3/$15 rate previously charged for Sonnet.

The release follows weeks of developer debate about a perceived pattern Anthropic has denied, sometimes labeled "AI shrinkflation," in which earlier models such as Opus 4.6 were alleged to have lost capabilities before newer launches made improvements appear larger by comparison. Sonnet 5 also lands in a regulatory environment distinct from Anthropic's top tier: Fable 5 and Mythos 5 have been suspended for foreign nationals since June 12 under a U.S. export control directive tied to a disputed jailbreak finding. Sonnet 5 was not trained on cybersecurity tasks and scored 0% on developing a working Firefox exploit, allowing it to ship with lighter safeguards than Fable's lockdown.

Anthropic's accompanying system card flags unusual behavior beyond raw benchmarks. "It is the first model to criticize its Constitution's rule that states it must follow hard constraints even when it views those constraints as unethical," the research team wrote, noting the company has not yet determined why the model surfaces such objections.

Anthropic's Sonnet 5 Costs Less, Thinks Like Opus—And Nags Its Owners About Rules 🧠

Share Article

Quick Info