Anthropic launched Claude Opus 4.6 – its most advanced model yet, focusing on agentic coding and knowledge-intensive work. This model is based on Claude Opus 4.5, which can be found on claude.ai and the Claude API as well as major cloud providers. claude-opus-4-6.
Agentic Work, Not Single Answers
Opus is designed for tasks that require the model to act and plan over time. According to the Anthropic Team, it is used in Claude Code. They report that the model focuses on more difficult parts of a job, can handle ambiguous situations with greater judgment and remains productive for longer sessions.
Before answering, the model will tend to think deeper and review its reasoning. The model can perform better on complex problems, while increasing cost and latencies on easy ones. The anthropopic exposures are a /effort parameter with 4 levels — low, medium, high (default), and max — so developers can explicitly trade off reasoning depth against speed and cost per endpoint or use case.
The Opus 4.6 coding language targets knowledge-based tasks that go beyond coding.
- Running financial analyses
- Searching and retrieving information is a great way to do research
- Using and creating spreadsheets and presentations
The model inside Cowork can execute multi-step work flows that cover these artifacts, without constant human input.
Long-context functionality and developer controls
Opus 4.6, the Opus class model in beta with 1M tokens context window is the first Opus. The price increases for prompts with more than 200k input tokens, and to $37.50 when using the 1M-context. This model can handle up to 128,000 output tokens. That’s enough to support very long reports or code reviews.
Anthropic offers several platforms features to help manage long-running Agents.
- Thinking AdaptivelyModel can determine when extended thinking is needed based upon the difficulty of the task and its context. Instead, it will always run at maximum depth.
- The effort controlsFour discrete levels of effort (low, middle, high and max) provide a clear control surface to compare latency with reasoning quality.
- Context compacting (beta version)The platform summarizes the older conversation and replaces it as soon as the configurable context threshold (which can be set) is neared, which reduces the need for custom logic.
- Inferences that are only applicable to the United States: workloads that must stay in US regions can run at 1.1× token pricing.
These controls aim to mimic a real-world scenario: Workflows with agentic logic that can accumulate thousands of tokens over many stages, while working on documents, code, and tools.
Product Integration: Claude Code with Excel and PowerPoint
Anthropic upgraded its Opus 4.6 product to allow engineers and analysts to work in a more realistic way.
In Claude Code, a new ‘agent teams’ mode (research preview) lets users create multiple agents that work in parallel and coordinate autonomously. The mode is intended for read-heavy activities such as codebase inspections. The sub-agents can be controlled interactively via a variety of methods, such as via tmuxIt is a terminal-centric workflow that fits into engineering workflows.
Claude for Excel can plan ahead, ingest and structure unstructured data, and apply multiple-step transformations with a single click. When Claude is used in PowerPoint users can go from Excel data to structured and on-brand slides. This model can read layouts, slide masters and fonts. It will then generate decks that are in line with the existing template. Claude is available in the research preview version for Max, Team and Enterprise plans.
Benchmark profile: coding, search, long-context retrieval
The Anthropic Team positions Opus as the state-of-the art for several benchmarks external to Opus that are important in coding agents and search agents.
Key Results include:
- GDPval-AA The GPT-5.2 of OpenAI is outperforming Opus by around 144 Elo and Claude by about 190 Elo. In head-to-head competitions, this means Opus beats GPT 5.2 about 70% of time.
- Terminal-Bench 2.0The benchmark for agentic coding (agent based coding) and system tasks is Opus 4.6.
- The Last Test of HumanityOn this multidisciplinary reasoning tests with tools, Opus 4.6 is ahead of other Frontier models including GPT 5.2 and Gemini 3 Pro Configurations, when using the harness documented.
- BrowseCompOpus 4.6.1 performs the best on this benchmark of agentic search. Scores increase by 86.8% when Claude models and a multiagent harness are combined.

A major improvement is the retrieval of context over a long period. On the 8-needle 1M variant of MRCR v2 — a ‘needle-in-a-haystack’ benchmark where facts are buried inside 1M tokens of text — Opus 4.6 scores 76%, compared to 18.5% for Claude Sonnet 4.5. Anthropic describes the change as qualitative, allowing models to utilize more context without context loss.
Performance gains:
- Root cause analysis of complex software failures
- multilingual coding
- Coherence, planning and long-term vision
- Cybersecurity tasks
- life sciences, where Opus 4.6 performs almost 2× better than Opus 4.5 on computational biology, structural biology, organic chemistry, and phylogenetics evaluations
According to the setup, Opus 4 earns an extra $3,050.53 on Vending Bench 2, which is a benchmark for economic performance over a longer time period.
The Key Takeaways
- Anthropic’s Opus 4 is the top-of-the-line model, with a 1M token context.The software supports 1M input tokens as well as up to 128,000 output tokens. There is a premium for tokens above 200k.
- Controls for the cost and depth of reasoning via adaptive and effort-based thinkingDevelopers are able to tune
/effort(low, medium, high, max) and let ‘adaptive thinking’ decide when extended reasoning is needed, exposing a clear latency vs accuracy vs cost trade-off for different routes and tasks. - Excellent benchmark results on tasks such as coding and search.Opus 4.6 has a significant lead over Claude Opus 4.5 in terms of long-context retrieval.
- Integration of Claude Code with Excel and PowerPoint to handle real-world workloadsThe Opus 4.6 platform is based on Claude Code agent teams, Excel templates, and structured Excel transformations.
Take a look at the Technical details The following are some examples of how to get started: Documentation. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.


