Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
  • Top 19 AI Red Teaming Tools (2026): Secure Your ML Models
  • OpenAI’s Kevin Weil is Leaving The Company
  • Looking into Sam Altman’s Orb on Tinder Now proves that you are human
AI-trends.todayAI-trends.today
Home»AI»Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It

Anthropic’s New Model Excels at Reasoning and Planning—and Has the Pokémon Skills to Prove It

AI By admin27/05/20253 Mins Read
Facebook Twitter LinkedIn Email
Anthropic’s New Model Excels at Reasoning and Planning—and Has the
Anthropic’s New Model Excels at Reasoning and Planning—and Has the
Share
Facebook Twitter LinkedIn Email

The game Claude 3.7 Sonnet was a challenge for Claude 3.7 Sonnet.dozens of hours” was stuck in a city, and it had difficulty identifying other players. This severely hindered the progress of its game. With Claude 4 Opus, Hershey noticed an improvement in Claude’s long-term memory and planning capabilities when he watched it navigate a complex Pokémon quest. When the AI realized it would need a specific power to continue, it spent two full days honing its skills. Hershey says that multi-step reasoning without immediate feedback shows a higher level of coherence. This means the model is better able to stay on course.

“This is one of my favorite ways to get to know a model. Like, this is how I understand what its strengths are, what its weaknesses are,” Hershey’s says “It’s my way of just coming to grips with this new model that we’re about to put out, and how to work with it.”

All Agents Wanted

Anthropic’s Pokémon research is a novel approach to tackling a preexisting problem—how do we understand what decisions an AI is making when approaching complex tasks, and nudge it in the right direction?

The answer to that question is integral to advancing the industry’s much-hyped AI agents—AI that can tackle complex tasks with relative independence. In Pokémon, it’s important that the model doesn’t lose context or “forget” This is the case for AI agents who are asked to automate a workflow, even if it takes hundreds of hours. That also applies to AI agents asked to automate a workflow—even one that takes hundreds of hours.

“As a task goes from being a five-minute task to a 30-minute task, you can see the model’s ability to keep coherent, to remember all of the things it needs to accomplish [the task] successfully get worse over time,” Hershey’s says

Anthropic, like many other AI labsThe company hopes to develop powerful agents that can be sold to consumers as products. Krieger claims that Anthropic’s “top objective” This year, Claude “doing hours of work for you.”

“This model is now delivering on it—we saw one of our early-access customers have the model go off for seven hours and do a big refactor,” Krieger is referring to a process that involves restructuring large amounts of code in order to organize and make them more efficient.

Google and OpenAI have been working on this kind of future. Google Mariner was released earlier this week. an AI agent built into Chrome OpenAI recently released a new version of its AI that allows it to do simple tasks such as buying groceries for $249.99 per monthly. OpenAI has recently released a coding agentIt was a couple of months ago it launched OperatorAn agent can search the internet on behalf of a user.

Anthropic’s competitors often see it as moving more slowly, doing research faster but deploying slower. This is a plus, especially with AI that’s powerful. There are so many potential problems with agents that have access to user information such as their inboxes or banking logins. Anthropic, in a post published on its blog Thursday, says: “We’ve significantly reduced behavior where the models use shortcuts or loopholes to complete tasks.” The company also says that both Claude 4 Opus and Claude Sonnet 4 are 65 percent less likely to engage in this behavior, known as reward hacking, than prior models—at least on certain coding tasks.

anthropic artificial intelligence chatbots models
Share. Facebook Twitter LinkedIn Email
Avatar
admin
  • Website

Related Posts

Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In

18/04/2026

Hacking the EU’s new age-verification app takes only 2 minutes

18/04/2026

OpenAI’s Kevin Weil is Leaving The Company

17/04/2026

Looking into Sam Altman’s Orb on Tinder Now proves that you are human

17/04/2026
Top News

Anthropic claims that Claude has its own set of emotions

OpenAI’s Atlas Browser Takes Direct Intention at Google Chrome

Apple Is Pushing AI Into More of Its Products—but Still Lacks a State-of-the-Art Model

The FBI can access your push notifications

Disney and Universal Sue AI Company midjourney for copyright infringement

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Google AI introduces DS STAR, a multi-agent data science system that plans, codes and verifies end to end analytics.

06/11/2025

IBM has released the new Granite 4.0 model with a hybrid Mamba-2/Transformer architecture that dramatically reduces memory usage without sacrificing performance.

03/10/2025
Latest News

Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In

18/04/2026

Hacking the EU’s new age-verification app takes only 2 minutes

18/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.