Anthropic today released Opus 4. 5, its flagship frontier model, and it brings improvements in coding performance, as well as some user experience improvements that make it more generally competitive with OpenAI’s latest frontier models. Perhaps the most prominent change for most users is that in the consumer app experiences (web, mobile, and desktop), Claude will be less prone to abruptly hard-stopping conversations because they have run too long. The improvement to memory within a single conversation applies not just to Opus 4. 5, but to any current Claude models in the apps. Users who experienced abrupt endings (despite having room left in their session and weekly usage budgets) were hitting a hard context window (200, 000 tokens). Whereas some large language model implementations simply start trimming earlier messages from the context when a conversation runs past the maximum in the window, Claude simply ended the conversation rather than allow the user to experience an increasingly incoherent conversation where the model would start forgetting things based on how old they are. Now, Claude will instead go through a behind-the-scenes process of summarizing the key points from the earlier parts of the conversation, attempting to discard what it deems extraneous while keeping what’s important. Developers who call Anthropic’s API can leverage the same principles through context management and context compaction. Opus 4. 5 performance Opus 4. 5 is the first model to surpass an accuracy score of 80 percent-specifically, 80. 9 percent in the SWE-Bench Verified benchmark, narrowly beating OpenAI’s recently released GPT-5. 1-Codex-Max (77. 9 percent) and Google’s Gemini 3 Pro (76. 2 percent). The model performs particularly well in agentic coding and agentic tool use benchmarks, but still lags behind GPT-5. 1 in visual reasoning (MMMU). Anthropic also claims that Opus 4. 5 is far less susceptible to prompt injection attacks than prior Claude models, or than competing models like GPT-5. 1 and Gemini 3 Pro. Still, none of these models has perfect performance on that front. While the improvements to performance in benchmarks are worth noting, the most meaningful improvement in Opus 4. 5 is arguably that it is significantly more efficient with tokens. Anthropic’s blog post offers examples: Set to a medium effort level, Opus 4. 5 matches Sonnet 4. 5’s best score on SWE-bench Verified, but uses 76% fewer output tokens. At its highest effort level, Opus 4. 5 exceeds Sonnet 4. 5 performance by 4. 3 percentage points-while using 48% fewer tokens. Other updates The Opus 4. 5 launch is accompanied by other new features for developers and users. For example, the developer platform now includes a new “effort” parameter, allowing developers to more precisely tune the balance they want between efficacy and token usage. Also, Claude Code is now available in the desktop Claude apps. The Claude desktop interface is now tabbed between the traditional chat experience and the Claude Code experience. And lastly (and for some, most importantly), there’s a big pricing change for the API for Opus 4. 5. The cost is now $5 (input)/$25 (output) per million tokens, down from $15/$75.
https://arstechnica.com/ai/2025/11/anthropic-introduces-opus-4-5-cuts-api-pricing-and-enables-much-longer-claude-chats/
Anthropic introduces Opus 4.5, cuts API pricing, and enables much longer Claude chats