Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • 5 Reasons to Think Twice Before Using ChatGPT—or Any Chatbot—for Financial Advice
  • OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval
  • Your Favorite AI Gay Thirst Traps: The Men Behind them
  • Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin
  • Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges
  • Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model
  • Stanford Students Wait in Line to Hear From Silicon Valley Royalty at ‘AI Coachella’
  • Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.
AI-trends.todayAI-trends.today
Home»Tech»OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Tech By Gavin Wallace24/04/20266 Mins Read
Facebook Twitter LinkedIn Email
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Meta AI Introduces Multi-SpatialMLLM: A Multi-Frame Spatial Understanding with Multi-modal
Share
Facebook Twitter LinkedIn Email

OpenAI has launched GPT-5.5, its most succesful mannequin so far and the primary absolutely retrained base mannequin since GPT-4.5. GPT-5.5 is designed to finish advanced, multi-step pc duties with minimal human course. Consider it because the distinction between an assistant who wants a guidelines and one who understands the underlying aim and figures out the steps themselves. The discharge is rolling out at this time to Plus, Professional, Enterprise, and Enterprise subscribers throughout ChatGPT and Codex.

What ‘Agentic’ Really Means Right here

An agentic mannequin doesn’t simply reply to a single immediate — it takes a sequence of actions, makes use of instruments (like looking the net, writing code, operating scripts, or working software program), checks its personal work, and retains going till the duty is completed. Prior fashions typically stalled at handoff factors, requiring the consumer to re-prompt or appropriate course. GPT-5.5 is constructed to cut back these interruptions.

OpenAI launched GPT-5.5 as a mannequin focused at agentic pc use — it writes and debugs code, browses the net, fills out spreadsheets, and retains working by means of multi-step duties with out requiring a human to oversee each transfer.

The 4 Domains The place Good points Are Concentrated

The good points are concentrated in 4 areas: agentic coding, pc use, information work, and early scientific analysis — domains OpenAI describes as these ‘where progress depends on reasoning across context and taking action over time.’

For software program engineers, essentially the most instantly related benchmark is SWE-Bench Professional, which evaluates real-world GitHub situation decision throughout 4 programming languages. GPT-5.5 resolves 58.6% of duties end-to-end in a single cross. Value noting: Claude Opus 4.7 scores greater at 64.3% on this similar benchmark, although OpenAI has famous that Anthropic reported indicators of memorization on a subset of these issues, which can have an effect on the comparability.

For long-horizon coding particularly, OpenAI additionally experiences outcomes on Skilled-SWE, an inner benchmark measuring duties with a median estimated human completion time of 20 hours. GPT-5.5 outperforms GPT-5.4 on Skilled-SWE. This benchmark is critical as a result of it displays the sort of prolonged, multi-session engineering work — massive refactors, characteristic builds, debugging deep in a codebase — that agentic instruments are more and more being requested to deal with autonomously.

Builders who examined the system early stated GPT-5.5 has a greater understanding of the “shape” of a software program system, and might higher perceive why one thing is failing, the place the repair is required, and what else within the codebase could be affected.

https://openai.com/index/introducing-gpt-5-5/

For ML engineers and information scientists who spend important time in terminal environments orchestrating pipelines and debugging scripts, the Terminal-Bench 2.0 outcomes are essentially the most compelling sign. GPT-5.5 scores 82.7% on Terminal-Bench 2.0, which exams advanced command-line workflows requiring planning, iteration, and gear coordination — beating Claude Opus 4.7 at 69.4% and Gemini 3.1 Professional at 68.5%. That isn’t a marginal lead.

For broader information work, GPT-5.5 scores 84.9% on GDPval, which exams brokers throughout 44 occupations of information work. On OSWorld-Verified, a benchmark measuring whether or not a mannequin can autonomously function actual pc environments, it reaches 78.7%.

GPT-5.5 additionally ships with a Professional variant constructed for higher-accuracy, tougher duties. On BrowseComp, which exams a mannequin’s potential to trace down hard-to-find info throughout the net, GPT-5.5 Professional scores 90.1%, forward of Gemini 3.1 Professional at 85.9%. The mannequin can also be the top-ranked system on the Synthetic Evaluation Intelligence Index.

https://openai.com/index/introducing-gpt-5-5/

Pace and Token Effectivity

One concern with extra succesful fashions is that they are typically slower or dearer to run. OpenAI addressed this straight. GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving whereas performing higher throughout practically each analysis measured. It additionally makes use of considerably fewer tokens to finish the identical Codex duties — that means shorter, extra environment friendly runs even on advanced agentic workflows.

On pricing, the usual GPT-5.5 API can be charged at $5 per million enter tokens and $30 per million output tokens. For context, GPT-5.4 was priced at $2.50 per million enter tokens and $15 per million output tokens — so the per-token value has doubled. OpenAI group argued that token effectivity good points offset the associated fee, since GPT-5.5 completes the identical Codex duties with fewer tokens, that means cheaper runs general even on the greater per-token charge. GPT-5.5 Professional, the higher-accuracy variant, is priced at $30 per million enter tokens and $180 per million output tokens within the API.

For groups operating Codex at scale, the online math is what issues: if GPT-5.5 completes a activity in materially fewer tokens than GPT-5.4, the efficient price per accomplished workflow can nonetheless come out decrease regardless of the upper charge.

Scale and Adoption

OpenAI has seen a surge in Codex utilization, with about 4 million builders utilizing the software weekly. That scale issues for understanding the deployment context: GPT-5.5 shouldn’t be a analysis preview however a manufacturing mannequin being pushed to an lively, massive developer base instantly on launch.

Key Takeaways

  • GPT-5.5 is OpenAI’s first absolutely retrained base mannequin since GPT-4.5, designed particularly for agentic workflows — it might perceive advanced targets, use instruments, examine its personal work, and carry multi-step duties by means of to completion with minimal human course.
  • The largest efficiency good points are in agentic coding, pc use, information work, and early scientific analysis — GPT-5.5 scores 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 78.7% on OSWorld-Verified, outperforming each Claude Opus 4.7 and Gemini 3.1 Professional on a number of key benchmarks.
  • GPT-5.5 matches GPT-5.4’s per-token latency whereas being extra succesful throughout practically each benchmark — it additionally makes use of considerably fewer tokens to finish the identical Codex duties, that means higher outcomes and not using a proportional improve in velocity or price per accomplished workflow.
  • API pricing will increase to $5/M enter tokens and $30/M output tokens (up from $2.50 and $15 for GPT-5.4), with GPT-5.5 Professional priced at $30/M enter and $180/M output — OpenAI group argues token effectivity good points offset the upper per-token charge for many workloads.
  • GPT-5.5 is rolling out at this time to Plus, Professional, Enterprise, and Enterprise customers in ChatGPT and Codex, with roughly 4 million builders already utilizing Codex weekly.


Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking advanced datasets into actionable insights.

AI openai
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

24/04/2026

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

24/04/2026

Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model

23/04/2026

Google Cloud AI Research introduces ReasoningBank: a memory framework that distills reasoning strategies from agent successes and failures.

23/04/2026
Top News

Video Games: The New Battleground For Actors And AI Protection

Lisa Su, AMD’s CEO, says concerns about an artificial intelligence bubble are overblown

How much energy does AI use? The people who are in the know won’t say anything

Moltbook – the social network for AI Agents – exposed data of real humans

OpenAI’s Atlas Browser Takes Direct Intention at Google Chrome

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

NVIDIA introduces PivotRL, a new AI framework that achieves high agentic accuracy with 4x fewer rollout turns efficiently.

25/03/2026

Researchers from MIT, NVIDIA, and Zhejiang College Suggest TriAttention: A KV Cache Compression Methodology That Matches Full Consideration at 2.5× Larger Throughput

11/04/2026
Latest News

5 Reasons to Think Twice Before Using ChatGPT—or Any Chatbot—for Financial Advice

24/04/2026

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

24/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.