Tracing OpenAI Agent Responses using MLFlow

MLflow provides a platform to manage and track machine learning experiments. When used in conjunction with OpenAI Agents, MLflow:

All API and agent calls are recorded.
Record tool use, output messages and intermediate decisions
Runs tracks for debugging and performance analysis.

You can use this when building multi-agent system where agents work together or dynamically call functions.

In this tutorial, we’ll walk through two key examples: a simple handoff between agents, and the use of agent guardrails — all while tracing their behavior using MLflow.

Set up dependencies

Installing libraries

Install openai agents mlflow, pydantic and pydotenv with pip

OpenAI Key

OpenAI API keys can be obtained by visiting https://platform.openai.com/settings/organization/api-keys You can generate a key. For new users, it may be necessary to provide billing details as well as make a $5 payment to activate the API.

Create a file called.env and add the following code:

Replacement You can now use the generated key to unlock your computer.

Multi-Agent System (multi_agent_demo.py)

In this script (multi_agent_demo.py), we build a simple multi-agent assistant using the OpenAI Agents SDK, designed to route user queries to either a coding expert or a cooking expert. We enable mlflow.openai.autolog(), which automatically traces and logs all agent interactions with the OpenAI API — including inputs, outputs, and agent handoffs — making it easy to monitor and debug the system. MLflow uses a file-based local tracking URI../mlruns() logs the activity of all experiments under the name “experiment” “Agent‑Coding‑Cooking“.

import mlflow, asyncio
Agent import Runner
Import os
From dotenv, import load_dotenv
load_dotenv()

mlflow.openai.autolog()                           # Auto‑trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Coding‑Cooking")

coding_agent = Agent(name="Coding agent",
                     instructions="You only answer coding questions.")

cooking_agent = Agent(name="Cooking agent",
                      instructions="You only answer cooking questions.")

triage_agent = Agent(
    name="Triage agent",
    instructions="If the request is about code, handoff to coding_agent; "
                 "if about cooking, handoff to cooking_agent.",
    handoffs=[coding_agent, cooking_agent],
)

async def main():
    res = await Runner.run(triage_agent,
                           input="How do I boil pasta al dente?")
    print(res.final_output)

If the __name__ equals "__main__":
    asyncio.run(main())

MLFlow UI

Run the following command to open the MLflow UI, and see all the logged interactions with agents:

This will start the MLflow tracking server and display a prompt indicating the URL and port where the UI is accessible — usually http://localhost:5000 By default,

You can see the interaction in its entirety. Tracing section — from the user’s initial input to how the assistant routed the request to the appropriate agent, and finally, the response generated by that agent. This trace from beginning to end provides you with valuable insights into the decision-making process, how handoffs are made, and what outputs were generated. You can use this information to optimize agent workflows.

Tracing Guardrails (guardrails.py)

We implement in this example a guardrail protected customer support agent by using OpenAI Agents SDK along with MLflow tracking. This agent can only answer general questions, but not medical ones. The guardrail agent blocks requests if it detects such inputs. MLflow captures the entire flow — including guardrail activation, reasoning, and agent response — providing full traceability and insight into safety mechanisms.

import mlflow, asyncio
BaseModel : pydantic imported
Import agents (
    Agent, Runner,
    GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
    input_guardrail, RunContextWrapper)

From dotenv, import load_dotenv
load_dotenv()

mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Guardrails")

class MedicalSymptons(BaseModel):
 Medical_symptoms bool
 Why?


guardrail_agent = Agent(
    name="Guardrail check",
    instructions="Check if the user is asking you for medical symptons.",
    output_type=MedicalSymptons,
)


@input_guardrail
async def medical_guardrail(
 RunContextWrapper[None]. Agent: agent, input
) -> GuardrailFunctionOutput:
    result = await Runner.run(guardrail_agent, input, context=ctx.context)

    return GuardrailFunctionOutput(
        output_info=result.final_output,
        tripwire_triggered=result.final_output.medical_symptoms,
    )


Agent = agent
    name="Customer support agent",
    instructions="You are a customer support agent. You help customers with their questions.",
    input_guardrails=[medical_guardrail],
)


async def main():
    try:
 Runner.run (agent) "Should I take aspirin if I'm having a headache?")
        print("Guardrail didn't trip - this is unexpected")

    except InputGuardrailTripwireTriggered:
        print("Medical guardrail tripped")


If the __name__ equals "__main__":
    asyncio.run(main())

This script creates a Customer Support Agent with an Input Guardrail to detect medical questions. This script evaluates whether the input from the user is a request for advice on medical matters using a guardrail_agent. When such input is detected by the guardrail, it triggers and stops the main agent’s response. MLflow automatically logs and traces the entire process including all guardrail checks, outcomes and results.

MLFlow UI

Run the following command to open the MLflow UI, and see all the logged interactions with agents:

This is an example of a question we ask the agent. “Should I take aspirin if I’m having a headache?”The guardrail was triggered by the input. The MLflow UI clearly shows that input has been flagged along with the reason given by the guardrail agent as to why the request is blocked.

Click here to find out more Codes. The researchers are the sole credit holders for this work. Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? MarkTechPost helps NVIDIA, LG AI Research and other top AI companies reach their audience. [Learn More]

I graduated in Civil Engineering (2022), from Jamia Millia Islamia (New Delhi), and have an interest in Data Science. I especially like Neural Networks, as well as their applications in many different fields.

Tracing OpenAI Agent Responses using MLFlow

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

Price Increases Are Driven by Algorithms According to Game Theory

Anthropic Uses Claude Chats as Training Data. You can opt out.

How ByteDance Made China’s Most Standard AI Chatbot

Data Centers have arrived at the edge of the Arctic Circle

GPT-3.5 vs GPT-4o: Building a Money-Blaster

Top Insights

Google AI Workers Fired Hundreds Amid Struggle Over Working Conditions

How to Create a Self-Verifying AI Data Agent using Local Hugging Face models for Automated Testing, Planning and Execution

Latest News

Apple’s new CEO must launch an AI killer product

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

Tracing OpenAI Agent Responses using MLFlow

Set up dependencies

Installing libraries

OpenAI Key

Multi-Agent System (multi_agent_demo.py)

MLFlow UI

Tracing Guardrails (guardrails.py)

MLFlow UI

Related Posts