OpenAI has launched GPT-5.5, its most succesful mannequin so far and the primary absolutely retrained base mannequin since GPT-4.5. GPT-5.5 is designed to finish advanced, multi-step pc duties with minimal human course. Consider it because the distinction between an assistant who wants a guidelines and one who understands the underlying aim and figures out the steps themselves. The discharge is rolling out at this time to Plus, Professional, Enterprise, and Enterprise subscribers throughout ChatGPT and Codex.
What ‘Agentic’ Really Means Right here
An agentic mannequin doesn’t simply reply to a single immediate — it takes a sequence of actions, makes use of instruments (like looking the net, writing code, operating scripts, or working software program), checks its personal work, and retains going till the duty is completed. Prior fashions typically stalled at handoff factors, requiring the consumer to re-prompt or appropriate course. GPT-5.5 is constructed to cut back these interruptions.
OpenAI launched GPT-5.5 as a mannequin focused at agentic pc use — it writes and debugs code, browses the net, fills out spreadsheets, and retains working by means of multi-step duties with out requiring a human to oversee each transfer.
The 4 Domains The place Good points Are Concentrated
The good points are concentrated in 4 areas: agentic coding, pc use, information work, and early scientific analysis — domains OpenAI describes as these ‘where progress depends on reasoning across context and taking action over time.’
For software program engineers, essentially the most instantly related benchmark is SWE-Bench Professional, which evaluates real-world GitHub situation decision throughout 4 programming languages. GPT-5.5 resolves 58.6% of duties end-to-end in a single cross. Value noting: Claude Opus 4.7 scores greater at 64.3% on this similar benchmark, although OpenAI has famous that Anthropic reported indicators of memorization on a subset of these issues, which can have an effect on the comparability.
For long-horizon coding particularly, OpenAI additionally experiences outcomes on Skilled-SWE, an inner benchmark measuring duties with a median estimated human completion time of 20 hours. GPT-5.5 outperforms GPT-5.4 on Skilled-SWE. This benchmark is critical as a result of it displays the sort of prolonged, multi-session engineering work — massive refactors, characteristic builds, debugging deep in a codebase — that agentic instruments are more and more being requested to deal with autonomously.
Builders who examined the system early stated GPT-5.5 has a greater understanding of the “shape” of a software program system, and might higher perceive why one thing is failing, the place the repair is required, and what else within the codebase could be affected.

For ML engineers and information scientists who spend important time in terminal environments orchestrating pipelines and debugging scripts, the Terminal-Bench 2.0 outcomes are essentially the most compelling sign. GPT-5.5 scores 82.7% on Terminal-Bench 2.0, which exams advanced command-line workflows requiring planning, iteration, and gear coordination — beating Claude Opus 4.7 at 69.4% and Gemini 3.1 Professional at 68.5%. That isn’t a marginal lead.
For broader information work, GPT-5.5 scores 84.9% on GDPval, which exams brokers throughout 44 occupations of information work. On OSWorld-Verified, a benchmark measuring whether or not a mannequin can autonomously function actual pc environments, it reaches 78.7%.
GPT-5.5 additionally ships with a Professional variant constructed for higher-accuracy, tougher duties. On BrowseComp, which exams a mannequin’s potential to trace down hard-to-find info throughout the net, GPT-5.5 Professional scores 90.1%, forward of Gemini 3.1 Professional at 85.9%. The mannequin can also be the top-ranked system on the Synthetic Evaluation Intelligence Index.

Pace and Token Effectivity
One concern with extra succesful fashions is that they are typically slower or dearer to run. OpenAI addressed this straight. GPT-5.5 matches GPT-5.4’s per-token latency in real-world serving whereas performing higher throughout practically each analysis measured. It additionally makes use of considerably fewer tokens to finish the identical Codex duties — that means shorter, extra environment friendly runs even on advanced agentic workflows.
On pricing, the usual GPT-5.5 API can be charged at $5 per million enter tokens and $30 per million output tokens. For context, GPT-5.4 was priced at $2.50 per million enter tokens and $15 per million output tokens — so the per-token value has doubled. OpenAI group argued that token effectivity good points offset the associated fee, since GPT-5.5 completes the identical Codex duties with fewer tokens, that means cheaper runs general even on the greater per-token charge. GPT-5.5 Professional, the higher-accuracy variant, is priced at $30 per million enter tokens and $180 per million output tokens within the API.
For groups operating Codex at scale, the online math is what issues: if GPT-5.5 completes a activity in materially fewer tokens than GPT-5.4, the efficient price per accomplished workflow can nonetheless come out decrease regardless of the upper charge.
Scale and Adoption
OpenAI has seen a surge in Codex utilization, with about 4 million builders utilizing the software weekly. That scale issues for understanding the deployment context: GPT-5.5 shouldn’t be a analysis preview however a manufacturing mannequin being pushed to an lively, massive developer base instantly on launch.
Key Takeaways
- GPT-5.5 is OpenAI’s first absolutely retrained base mannequin since GPT-4.5, designed particularly for agentic workflows — it might perceive advanced targets, use instruments, examine its personal work, and carry multi-step duties by means of to completion with minimal human course.
- The largest efficiency good points are in agentic coding, pc use, information work, and early scientific analysis — GPT-5.5 scores 82.7% on Terminal-Bench 2.0, 84.9% on GDPval, and 78.7% on OSWorld-Verified, outperforming each Claude Opus 4.7 and Gemini 3.1 Professional on a number of key benchmarks.
- GPT-5.5 matches GPT-5.4’s per-token latency whereas being extra succesful throughout practically each benchmark — it additionally makes use of considerably fewer tokens to finish the identical Codex duties, that means higher outcomes and not using a proportional improve in velocity or price per accomplished workflow.
- API pricing will increase to $5/M enter tokens and $30/M output tokens (up from $2.50 and $15 for GPT-5.4), with GPT-5.5 Professional priced at $30/M enter and $180/M output — OpenAI group argues token effectivity good points offset the upper per-token charge for many workloads.
- GPT-5.5 is rolling out at this time to Plus, Professional, Enterprise, and Enterprise customers in ChatGPT and Codex, with roughly 4 million builders already utilizing Codex weekly.


