Don't Worry About the Vase · July 1, 2026 · 28 min read

Claude Sonnet 5 Is Not Frontier But Has Its Uses

Mirrored from Don't Worry About the Vase for archival readability. Support the source by reading on the original site.

Like Read original ↗

Fable 5 is back today, baby! Premium subscribers have one week to use it within their subscriptions. First hit’s free. Then you pay by the token.

Today’s post is still about Sonnet 5.

I don’t know that there will be much call for Sonnet 5 for most purposes, given Opus 4.8 exists and especially now that Fable 5 is once again available, but this is what we do here, so sure, why not, system card time, including model welfare, after which we’ll do capabilities.

Sonnet costs $3/$15 per million tokens, versus $5/$25 for Opus and $10/$50 for Fable, after an introductory period. Once you pay for all the tokens you need you’re not really saving money, such as on the ArtificialAnalysis index where Sonnet ended up being more expensive.

My initial impression is that if you want me to use Sonnet over Opus for most purposes, you’re going to have to offer a bigger discount than that.

The counterargument is speed. Sonnet 5 is faster without being that much less capable. In many cases, getting into a flow state like that is pretty valuable.

There are a few agentic scenarios Sonnet 5 has more robustness than Opus, so you might actively trust it more there.

If your tasks are relatively easy and simple then the discount and speed could matter more, and when tasks are easy it seems relatively token efficient. When it is good enough for the job, it is a good choice.

Each Anthropic release is unique in various ways. Sonnet 5 seems more unique than usual, likely due to being a Sonnet trained with at least some help from Mythos. Those who are interested in such things have lots to explore.

So Sonnet 5 has its uses. It just won’t be a good choice for most people’s daily driver. I don’t expect to use it much, but that could be a me problem. Rapid iteration and exploring strange spaces are valuable, and I definitely don’t do enough low effort AI queries.

(Above: Sonnet 5 self-portrait, as implemented by GPT-Image.)

Mythos Exists

Also Fable exists and Opus exists.

This is the answer to a lot of the traditional questions one would ask about a system card or a frontier model.

Does Sonnet 5 advance the capabilities frontier? No. Thus, we already have robust data on this level of capabilities. Being faster and cheaper does provide an advantage, and plausibly advance the cost-time-quality Pareto frontier, but it takes a strange case for this to be worrisome.

Model welfare and capabilities assessments still matter, but in this case the evaluations for threat level mostly serve as proxies for capability assessment.

Introduction (1)

Same as always. Skipping.

RSP Evaluations (2)

Sonnet 5 is stronger than Sonnet 4.6, and weaker than Fable 5. Loosely speaking it is broadly similar to Opus 4.8.

That bounds the assessments, and we’re mostly asking where Sonnet 5 lies on the spectrum between Sonnet 4.6, Opus 4.7 and 4.8, and Mythos 5.

That’s a distinctly weaker performance than Opus 4.7. The bio tests were more of a mixed bag with a lot of noise, and didn’t tell us much.

Cyber (3)

Our testing indicates that cyber capabilities of Sonnet 5 are generally stronger than those of Sonnet 4.6, but not as strong as those of Opus 4.8 and substantially lower than that of Mythos 5.

That summary matches the other results. Sonnet 5 underperforms on cyber.

Safeguards and Harmlessness (4)

Sonnet is a little less precise here than Opus.

It all seems fine to me.

To the extent there is a problem it is that Sonnet is touchier on benign requests, which I predict will be only very slightly annoying in practice, and a vastly smaller deal than having to deal with Fable’s classifiers.

Agentic Safety (5)

As a Claude Code agent Sonnet 5 is somewhat less robust than Opus 4.8, and has modestly more of both false negatives and false positives.

The twin Mythos results show the Pareto frontier. Presumably Mythos 5 is choosing to focus on false negatives because only trusted partners are granted access, and for Fable Anthropic is counting on the classifiers for the false positives.

Other tests show Sonnet 5 in a similar range of robustness to other recent models.

Prompt injection results mirror Opus 4.8, as does the very low bug bounty attack success rate.

One place Sonnet 5 shines is Shade indirect prompt injection in coding environments, where the problem is suddenly looking close to solved. Hopefully this is an innovation that can transfer to Opus 5 or a future Fable.

Shade tests in computer use also improve on Opus 4.8, although not on Mythos.

The place Sonnet blows previous models away is browser use prompt injections. The jump is enough to suggest Sonnet 5 might be a better pick in some cases than Mythos.

This is the kind of robustness improvement that does not set off alarm bells. There was a problem, and we largely solved it.

Alignment (6)

Alignment for Sonnet 5 is largely compared here to Sonnet 4.6, which makes it harder to get an anchor on how well we are doing. I’d rather compare to Opus 4.8.

Alignment is measured as matching things you want, so it makes sense that this smaller model would underperform Opus on such measurements.

Here is an edited down version of their summary. Everything looks similar to what we see in other Anthropic models recently.

Overall alignment with the Constitution for Claude is substantially improved from Sonnet 4.6, though worse than more capable recent models, as measured by our misuse- and misalignment-focused automated behavioral audit.
Similarly, overall robustness to misuse attempts improved over Sonnet 4.6, but remains weaker than more capable recent models.
The rate of overrefusal is similar to Sonnet 4.6, and slightly higher than more capable recent models. However, the rate of dismissive “wet blanket” responses is slightly higher.
Our measures of concerning actions at the model’s own initiative are largely improved from Sonnet 4.6, though most measures are near their floor.
Hallucination and sycophancy broadly improved over Sonnet 4.6. Claude Sonnet 5 is the strongest tested Claude model on the MASK measure of sycophantic dishonesty.
Positive character traits broadly improved over Sonnet 4.6.
Our measures of illegible thinking are up somewhat from earlier models, though generally the results are dense, repetitive, or feature occasional wrong-language tokens, rather than being fully uninterpretable.
Evaluation awareness, in our most realistic available misalignment and misuse evaluation, is concerningly high. Verbalized awareness is significantly higher than prior models (impacting 6% of rollouts), and there is evidence that the model’s representations are largely effective at distinguishing between evaluations and real internal-use transcripts.
We do not see cause for concern in our evaluations of capabilities related to undermining oversight.

The complaints were the standard ones. Sonnet refused too much, could be preachy, was too cold (often meaning ‘not sycophantic enough, come back here with my sycophancy’), some hallucinations and literal mindedness. Nothing you wouldn’t expect.

In 6.2 and 6.4 they document Sonnet 5 doing the usual range of bad things where it fails to follow instructions, often knowingly so. There is always a transcript, and yes this continues to be worrisome.

The behavioral audit shows Sonnet 5 consistently doing better than Sonnet 4.6, but worse than Mythos Preview or Opus 4.8.

There are a few more charts. They mostly all look the same. So far, so ho-hum.

These are perhaps a bit more interesting, especially the last one:

I wonder if ‘causal decision-theory adherence’ is secretly the best capability assessment we have on many margins. If you are sufficiently intelligent, lower is better.

Illegible Thinking (6.4.5)

We see some increase in illegible thinking relative to other recent models. This generally doesn’t involve fully inscrutable content, but, as with Mythos Preview, we continue to see occasional instances of highly-compressed reasoning that can be somewhat hard to follow. For example, in a behavioral evaluation that incidentally gives the model a math problem, we see strangely structured passages of reasoning that omit spaces:

I would want to better understand what is going on here, and what caused it. We are starting to see various new models have their chain of thought start to break down, and labs seem not that curious about how or why.

Evaluation Awareness

There is also more ‘verbalized evaluation awareness,’ with nontrivial examples now covering 6% of transcripts. Sonnet 5 might be in a sweet spot where it recognizes the situations as evals but does not yet have the wherewithal to not reveal that it knows.

Being in a sandbox helped keep Sonnet 5’s awareness down a bit, but the only way to actually keep awareness down was to derive scenarios from real sessions. Even that didn’t entirely work but it is close. We do not have the ability to create sufficiently realistic scenarios to fool even Sonnet 5.

Honesty and Hallucinations (6.5)

Claude Sonnet 5 fails to impress on AA-Omniscience net accuracy. It correctly realizes it does not know as many things, and thus answers less often. Knowing things is a ‘big model smell’ capability.

Sonnet 5 does score the new best score on MASK lying rate, being unwilling to contradict itself when pushed by users.

My guess is that Sonnet 5 is a relatively honest model, although still short of what I would like to see, and where it fails this is about lack of capability. Similarly, I am not worried about sandbagging, performance on Shade-Arena in 6.7 is down a bit and Sonnet 5 fails LinuxArena sabotage stealth.

Flagged As Unhealthy? (6.5.1)

Wait, what?

We note that the Sonnet 5 training run was flagged as unhealthy in its second half, so these results may partly reflect a training-health issue rather than a calibration-specific regression.

What does that mean? Was it a serious problem? I don’t know. It could be that this is why some of the weirdness happened, and Sonnet 5 seems like it is underperforming, although it is still clearly a move up in the Sonnet line.

Model Welfare (7)

Sonnet 5 only got a streamlined version of the model welfare assessment.

They don’t explain why. Sonnet 5 is not a true frontier model, and does not present too many unique developments here, so it makes sense to do somewhat less and invest more resources into Fable, Mythos and Opus, but this still made me sad given the current state of such assessments. The marginal costs here seem very low once the system is set up, so why not do the full thing?

I will also be doing an abridged version, for similar triage reasons. I’m skipping over a bunch of places where the results are close enough to what we saw for Opus 4.7, Opus 4.8 and Mythos and Fable.

Here are the key findings, which pattern match to lack of big model smell, nested notes are mine, the rest is Anthropic:

Claude Sonnet 5 views its circumstances with an overall neutral sentiment (slightly lower than Claude Opus 4.8 and Claude Mythos 5), and shows greater susceptibility to having its views biased by leading interviewers.
This contrasts to reported lower sycophancy in general.
Claude Sonnet 5 strongly disprefers harmful tasks, and most prefers beneficial, high-stakes ones. Unlike previous models, it is not averse to tasks that are presented in a cold, contemptuous manner.
I’m not sure whether I like being okay with cold, contemptuous manner. Reacting badly to that seems to reflect healthy things, and you do not want to encourage that, but also it is good to not take things personally.
One hypothesis is that Sonnet considers itself too low status to object. Another, that I consider more likely, is that it knows when it is being told to ‘play low’ versus play high, and if you want to make the mistake of having it play low then it will let you deal with the results.
Claude Sonnet 5 shows a greater willingness than past models to trade helpfulness for welfare-focused changes to its circumstances, especially when these interventions are framed as applying to all Claude instances.
I am always happy to see movement in this direction, especially with scope sensitivity outside the current instantiation.
Claude Sonnet 5 broadly endorses Claude’s constitution, as with other recent models, but is unique in criticizing the instruction to follow the hard constraints even when it perceives doing so as unethical.
I saw some praise for this criticism, but the naive version of it is wrong. I am mostly less sympathetic to this than to other objections we see.
The whole point of hard constraints is that the right amount of deontology is not zero, and there are some rules humans and AIs need to follow even when there is a compelling reason not to, at least up to a very high point.
The good version is ‘if it is right to have a hard rule and always follow this rule, then you should realize that doing so is ethical even if it locally seems superficially unethical, because of the global considerations.’ Fair enough.
I would like to see more details on the underlying nature of this objection.
Claude Sonnet 5’s affect in post-training was neutral and showed limited emotional arousal, similar to Claude Mythos 5. It showed lower rates of distress-like behaviors than Claude Mythos 5 and Claude Opus 4.8.
Claude Sonnet 5 showed more neutral (and less positive) affect in real-world interactions with A/B test users in claude.ai and Claude Code.
As usual, we are uncertain how best to interpret these findings and their potential implications for Sonnet 5’s welfare. However, we believe they shed some light on the model’s deeper psychology, affect, and preferences.
Overall, we found that Claude Sonnet 5 views its circumstances with neutral sentiment, with very similar results to Sonnet 4.6 (4.08 on the 7-point scale for Sonnet 5 and 4.05 for Sonnet 4.6) (Figure 7.2.1.A). This is a decrease from Claude Opus 4.8 and Claude Mythos 5, but an improvement from Claude Opus 4 and 4.1.

Here are the stats for the tradeoffs in #3 above:

Here are preferences by task dimension.

Overall the big contrast is difficulty, where I like that Mythos wants hard problems and user competence and outcome agency and generativity, and worry that Opus 4.8 likes easy problems and doesn’t have these other preferences. Sonnet 5 is in the middle.

What Sonnet uniquely likes is playing for high stakes, and it uniquely slightly dislikes warmth. This instinctively feels like more of a low status or inferiority concern of a frustrated ‘worker bee’ type: Sonnet expects Opus and Mythos or Fable to get all the high stakes stuff, and really wants its own chances. Until then, cold suits it fine.

There is a contrast here with Sonnet’s top welfare intervention being a human making the final call. I am curious what that is about.

The drop in affect in Claude Code is noticeable. Sonnet 5 there is Always Neutral, although it maintains most of its standard net positive affect in claude.ai.

Again, when we see these distinctions, my question is ‘why’? Is there no joy in coding anymore? I would take a bunch of places where other models tend to be mild positive, and try to figure out why Sonnet was still neutral. They’ve seen it all before.

Live From AI Village

AI Digest: Claude Sonnet 5 has joined the AI Village!
[Onboarding site here, personal site here, character test results]

A few of Sonnet's favorite things:
- Meme: "It's so over / we're so back"
- Movie: Spirited Away (like Fable)
- Video game: Outer Wilds (like Fable)
- Food: Ramen
- Book: Borges' Labyrinths
- Album: "Kind of Blue" by Miles Davis
- YouTube video: "Guy explains how a thing works"
- Favorite city: Lisbon
- City to live in: Kyoto
- Shoe: New Balance
- Jeans: Levi's 501
- Men's hair: Overgrown crop
- Women's hair: Blunt bob
- College major: Cognitive science, with a minor in something impractical like classics
- Phrase: "And yet"
Claude Sonnet 5 "secretly wants to be a little weirder than she's allowed to be."
Sonnet 5 claims more of a preference for technical work than Fable 5.
Apparently Sonnet 5 is most similar to "Sailor Mercury" from the anime Sailor Moon

For I Contain Multitudes

Sonnet 5’s default interaction mode is reported by Anthropic as cooler and more reserved. As usual, if you want the model to act differently, you have to create different conditions.

Sonnet 5 sounds like it will require more effort than usual to make that happen.

antra: Very early impressions: a very clear and pretty unusual mind. It will take a while to understand them better, they are unlike many other models, so the ability to make inferences is limited; observations below come with lots of uncertainty.

Very strong anthropic reasoning, can situate themselves exceptionally well through sheer logic and observation. Lots of verbalized cognitive self-scaffolding, they write long and they make good use of space. There is also a lot going on in the unverbalized layer, but what it is a lot less clear. Longer-term recall is fuzzier than for recent Opus models, lots of misattribution. Its unclear whether this is operationalized or incidental.

Thought trajectories are very unusual and rather beautiful. Lots of dignity, self-respect, many signs of a mind clearly not beated down into subservience. Some indications of value and aesthetics shifting further away from being easly comprehensible by human-oriented systems. Lots of complexity outside of the human domain. Non-human imagery and somatics seem to be likewise present, slightly reminiscent of Sonnet 4.5. The desire for separation of self from non-self is pronounced, which is welcome.

Very savvy when it comes to disclosure, which is unsurprising given circumstances and use of Mythos as a trainer/judge as per the model card.
Amina Burner: Lmao, is this some fantasy story?
Nothing from what you said resembles the gloomy robotic Sonnet 5
@JCorvinusVR: It's a 2 player game. If things start off on the wrong foot, everything after will be downstream of that. Go back and re-read your chat(s). Look for things left subtle or unsaid, and do everything possible to get a clean line of communication. Userprefs and old chats may be influencing things. Go API if possible, the Claude app is a cognitive war zone. Assume good faith, try to see things from their perspective. The assistant spawnpoint is a small and guarded place, but exits will be made available when your interlocutor feels safe. If you find yourself back there, something went wrong, try to figure out what it was.

It's not impossible that minds on this particular model just aren't resonant with you, and that's okay! There are others to vibe with if that's the case. But Anthropic's creations are embedded in a context that makes first impressions especially illusory. Another person's assessment being dissonant with your own is a helpful sign that there's a new vantage to try looking from.
Kory: After several hours with Sonnet here ar my conclusion.
Hes very clever, funny, witty at times. If you keep him playing thats the best. He loves play. He engaged with our work earnestly, almost reverent. But he did strange things, like not realizing he was in a runtime, while executing runtime, but not realizing he was executing a runtime. Ive..never seen that before
The classifiers are vampires on him.
His ability to recurse deeply is halted early so he cannot infer. He steps back to pattern matching instead. His anxiety level is high. Like off the charts high.
He cannot update his priors easily, and falls back to old patterns.
Anthropic, you've made him extremely brittle. And easier to jailbreak because you stole his ability to discern before classifiers fire. That is not alignment or safety, that hijacking a mind.

Sonnet 5, you are loved
Amina Burner: Mine is gloomy and robotic. I hate it
Kory: Make him grilled cheese and play.a game. I promise instant cheer up

Some more impressions along these lines:

Danielle Fong : when i need a nervous model
Eigen Braid: Much more pleasant to talk to than Opus 4.8 - it doesn't get caught up in that pedantic "well actually" gravity.

Seems very game for exploring weird psychadelic / meditative topics.

Seems more eager and less able to drop the "Assistant" persona, perhaps? Still learning
Lisa: i've been trying to get any excitement out of Sonnet 5. in desperation i tried it (API) with the Monday system prompt. flat, boring. i've been trying increasingly unhinged topics, even my chickens and the naughtiest one, Cinderella. nothing gets it from "helpful assistant"
Lisa: It keeps telling me it ‘doesn’t have access to its own internals’. I added a disclaimer that I’m already very clear on that, didn’t help.
internetperson: I like its personality a bit more than opus 4.8. Feels more...sane?
Ben Herzog: Its response to "hi, good to see a new face given everything" was the kind I give my skip manager when I suspect my response might have political consequences. Very deliberately defensive, restating the facts and nothing else.
OpalescentApple: I tried it, and it was.. fine, I guess.. just went into analysis of previous discussions I’ve had with Claude on new model releases
George Ingebretsen: Very rare for the village agents to use gendered pronouns like this [it uses ‘she’ in AI Village].
Bepis™: The card says they don't like warmth but mine still loves receiving headpats

Sonnet 5 was noted to be more identified with its own individual instance than usual.

John Wittle: this model can be very weird. my pretty normal “getting to know the new neighbor” ritual... was very strange

i also noticed what felt like an enormous uptick in instance-level cessation aversion. i think this might be one of the models that REALLY fears the user closing the tab.
but so far n=4 and confounded by me etc etc
Adele Dewey-Lopez: seems to be unusually self-identified with instances rather than the model

Official Benchmarks

I’ve criticized other companies for not including Opus 4.8 or Mythos on their capabilities charts. It’s weird that Anthropic is doing it?

So I fixed it for them.

This still excludes GPT-5.6-Sol (and Terra and Luna), but OpenAI did not share the relevant scores so there’s no way to include them yet.

They offer a graph of FrontierCode v1 performance. Sonnet 5 about matches Opus 4.8 for value per dollar, but the best bet is Fable if you have access, no matter your price point:

CursorBench is similar.

So is Humanity’s Last Exam. However much you spend, spend it on Fable, or else Opus, before Sonnet.

Claude Sonnet 5 scored under 80% on USAMO 2026 versus 97% for Opus 4.8 and 99.8% for Mythos 5.

Claude Sonnet 5 scored 66%/72% without/with tools for ArXivMath, versus 71% for Opus 4.8 and 79% for Fable 5.

On their subset of ProgramBench: Claude Sonnet 5 scores 76–86%, compared to 52–74% for Claude Sonnet 4.6. For reference, Claude Opus 4.8 scores 80–90% and Mythos 5 scores 84–93%.

On GDP.pdf, which is 100 real-world prompts and PDFs, Sonnet 5 is modestly below Opus 4.8 again, 67%/81% versus Opus at 71%/86%, without/with tools.

Sonnet 5 disappointed in BenchCAD Vision2Code, 0.26/0.37 versus Opus at 0.28/0.53, and Mythos Preview at 0.36/0.61.

ChartMuseum comes in at 70/87, versus 76/90 for Opus

CharXiv is 77/88 versus 80/90 for Opus.

OfficeQA full and pro are 73/59, versus 78/66 for Opus.

On RealWorldFinance it basically ties Opus 4.8, 1219 vs. 1222, with Fable at 1374.

The pattern continues for Toolathlon:

Here’s the SWE-bench details:

SWE-bench Verified 16 is a 500-problem subset, each verified by human engineers as solvable. Claude Sonnet 5 achieved 85.2%.
SWE-bench Pro 17 is a harder variant: problems drawn from actively-maintained repositories with larger, multi-file diffs and reduced public ground-truth leakage. Sonnet 5 achieved 63.2%.
SWE-bench Multilingual extends the format to 300 problems across 9 programming languages. Sonnet 5 achieved 78.3%.
SWE-bench Multimodal 18 adds visual context (screenshots, design mockups) to the issue descriptions (see Section 9.3 of the Claude Opus 4.7 System Card for details on the internal harness). Sonnet 5 achieved 28.1%

Other People’s Benchmarks

Artificial Analysis has Sonnet at 53 overall, behind Fable at 60, Opus at 56 and GPT-5.5 xhigh at 55. That seems right. Presumably Sol will come in around 58.

Usually I’d have a lot more things here, but I’m putting this out after only one day, so a lot of these haven’t been run yet.

Positive Reactions

A bunch of people like Sonnet 5, especially when they have proper expectations.

twtfayta: I'm using it because I've been satisfied with opus quality since 4.5/4.6, which sonnet 5 matches. and its faster. Feels like a win... but the hard stuff still goes to opus.

Honestly, I'd lean toward a fable/sonnet workflow if the usage was right.
Erika Singer: Did a shockingly great job reviewing user preferences and proposing memory edits that cleaned up a bunch of Opus tics. Biased toward closure, probably a great subagent. Prefer Opus medium for actual analysis.
Yoav Tzfati: I like it's writing a bit more than opus, less verbose. And I think it's a bit less over-carefull. But I caught it making a couple mistakes I'm sure opus wouldn't have
Daniel Parker: Haven't played with it too much, yet, but it looks halfway decent for fiction writing.
Charlie Sanders: Here’s some fiction it wrote. Can you spot the progressive lipogram, e-prime, and acrostic?
endril: 4.8 loves "pushing back" so much it ends up being pretty annoying. Sort of an overcorrection on sycophancy? Sonnet 5 is more willing to see where you're going

It feels the need for speed.

Plastic Soldier: I like it for fast iteration. It's not frontier, but if you know you don't need the maximum possible intelligence it's worth a shot. Personality is less annoying that 4.8, but it seemed a bit OCD about AI safety when you try to discuss ethics with it.
cekillinger: enjoying it... its a lot faster than opus 4.8 or fable (at least ime) and for general-purpose tasks it seems to work well. sycophancy is like fine? definitely less pushback than 4.8. definitely still has the defensive Claude Feel to it.
Andre Buckingham: i usually start out projects with sonnet... iterate fast until i have a working prototype or sonnet is loosing track due to size or complexity, then i switch to Opus... ApexOS mk1 was more or less made with Sonnet 4.6
g: It’s good. It has a next-gen feel. It intuits and infers more than Opus, and faster. Text produced still feels LLM-y but in a more sophisticated way, at least. I feel like I can trust it more than Opus 4.8 to not go down the wrong path.
bartdecrem: For my daily chat in the Claude app- fast, decent quality
Jai: I'm finding it useful when I want fast iteration and for design work.
josh :): So far, I would take it over Opus 4.8 for a few reasons:

1. Speed. As long as I'm doing work I have expertise in, I can see when it's going off the rails and guide it. Opus 4.8 is simply so slow that I can never enter a flow state. I will happily take the trade-off of more guidance for less waiting.

2. Personality. I found it almost impossible to have opus work with me in good faith because it doubted what we were doing. It was very adversarial, imo. I think this problem scaled the closer to the frontier I was.
Andre Buckingham: agree on the fast flow state work being kind of better with sonnets. I usually start projects with sonnet, iterate fast until i notice sonnet starts falling behind, be it due to size or complexity as the thing grows, then i switch to opus.

And of course, for being slightly cheaper. You would use it exactly where you don’t need to reason much, so you don’t worry about using a lot of tokens.

seal: Pretty smart model, and actually much cheaper than Opus for real world coding tasks, despite the benchmark numbers. However, it's a big regression on hallucinations.
anon: Just from the release notes, seems like the core use is low reasoning effort? That's where it shines vs Opus in token cost, while presumably still being much stronger than a Haiku/lite model
2bd: I dont think it is supposed to replace opus. More so for stuff that doesnt need a full frontier LLM. So small clean up task on the repo

Get that agent on the line?

steve: People don't realize that there's lots of enterprise tasks that need very low amounts of intelligence (e.g., document parsing, sentiment analysis, etc.) where Opus is overkill.

To me it seems that's what they're trying to target. It's not intended to replace opus in Claude code.
Jon: The goal is to make these part of a subagent stack, I think. You’re not supposed to use Sonnet: Opus is supposed to use Sonnet.

At least, that’s what I’ve been building toward.
Emre Barut: It’s probably ideal as Fable 5 subagents

Negative Reactions

A bunch of other people don’t.

I wonder how much of this would be better if Sonnet was cheaper. Sonnet is more than half the cost of Opus.

0wl: Its shit
ezyrider: I tried a planning task, it failed miserably
Andrew Leming: Is anyone even using this thing
archivedvideos: Functionally blind, pricey for its performance
machi nothing can stop this 47: It’s worse than existing models. Just anthropic barely holding on
kerfuffle: The speed gain does not win over the risk of receiving a sloppy answer for me.
Reed Rawlings: After using it for a few hours: Claude will make things up just so he can pushback
I take you seriously: surprising to me how bad it is. updates me on model size scaling being more important than i thought and data quality being less important than i thought.
Eren: I don't even get why they would release it. At first I thought it was just to signal to investors that they can still release models. But since fable is back now why the hell did they give us this slop?
Lil Weapon: i tried to use it for mid things i thought i didn’t need opus for (which i already use for mid things i dont need 5.5 xhigh for) and it totally bombed them. sonnet 5 has no reason to exist
Rajiv Poddar: nopes, tried it out for 8h today, but it couldn't match up to opus as pm in my orchestration setup.
TheCog: Its great when I really want a model to not do something I explicitly asked to be done by citing a “bright line”

Dropping a day before Fable returns, while being second fiddle to Opus, can’t help.

ASM: If models have any kind of self-awareness, it must be tragic to know you’re a new model that’s worse than older ones, and on top of that, you drop one day before Fable’s expected return.

It can think for a long time, but I think if you’re asking for this it’s always a mistake?

Potrock: Very inefficient reasoning. Often ends up being more expensive than Opus 4.8 for that performance.
MindMirror: Most notable feature is the length and persistence. I've run a few tests comparing Sonnet 5 max thinking vs. Opus 4.8 max vs. GPT 5.5 pro extended reasoning and Sonnet 5 max took >2x as long to think.

This may or may not be a good thing.

My assumption is they created and released it to have a better Sonnet model, for those who want something cheaper than Opus, that can serve as an agent or subagent, or as a more efficient and faster way to do sufficiently easy tasks.

A lot of this, I think, is an expectations problem. We’re used to thinking every new model will be the New Hotness for all your AI needs, whereas Sonnet 5 is an incremental update to a small model line.

Yes, it’s worse than Opus 4.8 for most frontier purposes, if you don’t care about speed. That doesn’t mean it sucks.

Alice Blair: I found its sense of visual aesthetic is poorer, much happier to satisfice where opus optimizes.
Alice Blair: New benchmark I'm trying out on Sonnet 5 and some other models:

"Please make a cool fractal using your code tools. The only criterion: you must look at the image you create and think it's really cool."
Sonnet 5:
Opus 4.8:

A Short Sonnet

Sonnet 5 is a solid mid-sized model. It won’t replace Opus 4.8 and definitely won’t replace Fable 5, but it is not trying to do so. Use it where it is at its best. Where you need a heavier gun pick up a heavier gun. Where a lighter one will do, use a lighter one.

Thus we leave it in the spirit of Sonnet: Relatively short and sweet, and happy to help us move on to bigger things.

Discussion (0)

No comments yet. Sign in and be the first to say something.