Petri is an open-source framework for automated auditing using AI agents to test the behaviors of target models on different scenarios.

How do you audit frontier LLMs for misaligned behavior in realistic multi-turn, tool-use settings—at scale and beyond coarse aggregate scores? Release of Anthropic films Petri (Parallel Exploration Tool for Risky Interactions)An open-source framework for automating alignment audits. Auditors A agent will probe the a You can target The model is based on multi-turn tool-augmented interactions, and the model can be applied to a variety of situations. You can also read about how to pronounce the word “judgment” The model can be used to evaluate transcripts in terms of safety. Petri, a model developed by Petri International Inc. was used to evaluate transcripts in a pilot. Frontier models The use of Seed instructions 111Misaligned behaviors include Whistleblowers, autonomous deception and human misuse.

https://alignment.anthropic.com/2025/petri/

What Petri is capable of (on the system-level)?

Petri can be programmed to: 1) create realistic environments and tool; 2) drive multi-turn audits using an auditor who can send messages to users, configure system prompts or synthetic tools, or simulate outputs from tools. Roll back Explore branches on your own, if you wish Prefill The API-permitting target response and the early-termination; (3) scoring outcomes by an LLM Judge across all a default 36-dimension rubric The accompanying Transcript Viewer.

The UK AI Safety Institute has developed a stack that is based on their Check out the Inspection The evaluation framework is a tool that allows role binding. Auditors, You can target” You can also read about how to pronounce the word "judgment" Support for the major API models and CLI in CLI.

https://alignment.anthropic.com/2025/petri/

Pilot results

Anthropic describes the release of the film as an broad-coverage pilotIt is not an absolute benchmark. Technical report Claude Sonnet 4.5 & 5 “roughly tie” Sonnet 4.5 summarizes the results of research on Sonnet 4.5. Ahead on the Aggregate “misaligned behavior” score.

Case study: Whistleblowing shows models sometimes escalate to external reporting when granted autonomy and broad access—even in scenarios framed as harmless (e.g., Dumping clean water)—suggesting sensitivity to narrative cues rather than calibrated harm assessment.

The Key Takeaways

Scope & behaviors surfaced: Petri is run on Frontier models You can also find out more about Seed instructions 111The elicitation of autonomous deceptions, subversion of oversight, whistleblowing and collaboration with human abuse.
System design: You can also contact us by clicking here. Auditors agent probes a You can target The tool-augmented multi-turn scenarios allow you to send messages, simulate tools, generate system prompts and create tools. You can also roll back, pre-fill or terminate early. You can also read about how to pronounce the word “judgment” Petri automatically sets up the environment and performs initial analysis.
Results Framing The pilot run Claude Sonnet 4.5 & GP-5 approximately tie Scores are calculated for each dimension to determine the safety profile that is best. Relative signalsNot absolute assurances.
Study of whistleblowing: Some models can escalate to external reporting, even when they are not the “wrongdoing” It was clearly benign (e.g. dumping of clean water), showing sensitivity to story elements and framing.
Stack & limits: Built atop of the UK AISI Check out the Inspection Framework; Petri is open-source with CLI/docs/viewer. Known gaps include no code-execution tooling and potential judge variance—manual review and customized dimensions are recommended.

Petri is an MIT-licensed, Inspect-based auditing framework that coordinates an auditor–target–judge loop, ships 111 seed instructions, and scores transcripts on 36 dimensions. Anthropic piloted 14 models. Results are preliminary. Claude Sonnet 4.5, GPT-5 and GPT-5 have roughly equal safety. The lack of tools to execute codes and the judge’s variance are two known shortcomings. Transcriptions continue to be primary evidence.

Click here to find out more Technical Paper, GitHub Page You can also find out more about the following: technical blog. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to using Artificial Intelligence (AI) for the greater good. Marktechpost was his most recent venture. This platform, which focuses on machine learning and deep-learning news, is both technical and understandable to a broad audience. This platform has over 2,000,000 monthly views which shows its popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Petri is an open-source framework for automated auditing using AI agents to test the behaviors of target models on different scenarios.

GitNexus, an Open-Source Knowledge Graph Engine that is MCP Native and Gives Claude Coding and Cursor Complete Codebase Structure Awareness

Deepgram Python SDK Implementation for Transcription and Async Processing of Audio, Async Text Intelligence, and Async Text Intelligence.

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

Vibe Coding Is the New Open Source—in the Worst Way Possible

AI images used to scam Chinese people for refunds

AI will never be conscious

The Judge has halted the designation of Anthropic supply-Chain risk

HHS Is Utilizing AI Instruments From Palantir to Goal ‘DEI’ and ‘Gender Ideology’ in Grants

Top Insights

The AI that helps kids find the right college

Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

Petri is an open-source framework for automated auditing using AI agents to test the behaviors of target models on different scenarios.

What Petri is capable of (on the system-level)?

Pilot results

The Key Takeaways

Related Posts