MLflow provides a platform to manage and track machine learning experiments. When used in conjunction with OpenAI Agents, MLflow:
- All API and agent calls are recorded.
- Record tool use, output messages and intermediate decisions
- Runs tracks for debugging and performance analysis.
You can use this when building multi-agent system where agents work together or dynamically call functions.
In this tutorial, we’ll walk through two key examples: a simple handoff between agents, and the use of agent guardrails — all while tracing their behavior using MLflow.
Set up dependencies
Installing libraries
Install openai agents mlflow, pydantic and pydotenv with pip
OpenAI Key
OpenAI API keys can be obtained by visiting https://platform.openai.com/settings/organization/api-keys You can generate a key. For new users, it may be necessary to provide billing details as well as make a $5 payment to activate the API.
Create a file called.env and add the following code:
Replacement
Multi-Agent System (multi_agent_demo.py)
In this script (multi_agent_demo.py), we build a simple multi-agent assistant using the OpenAI Agents SDK, designed to route user queries to either a coding expert or a cooking expert. We enable mlflow.openai.autolog(), which automatically traces and logs all agent interactions with the OpenAI API — including inputs, outputs, and agent handoffs — making it easy to monitor and debug the system. MLflow uses a file-based local tracking URI../mlruns() logs the activity of all experiments under the name “experiment” “Agent‑Coding‑Cooking“.
import mlflow, asyncio
Agent import Runner
Import os
From dotenv, import load_dotenv
load_dotenv()
mlflow.openai.autolog() # Auto‑trace every OpenAI call
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Coding‑Cooking")
coding_agent = Agent(name="Coding agent",
instructions="You only answer coding questions.")
cooking_agent = Agent(name="Cooking agent",
instructions="You only answer cooking questions.")
triage_agent = Agent(
name="Triage agent",
instructions="If the request is about code, handoff to coding_agent; "
"if about cooking, handoff to cooking_agent.",
handoffs=[coding_agent, cooking_agent],
)
async def main():
res = await Runner.run(triage_agent,
input="How do I boil pasta al dente?")
print(res.final_output)
If the __name__ equals "__main__":
asyncio.run(main())
MLFlow UI
Run the following command to open the MLflow UI, and see all the logged interactions with agents:
This will start the MLflow tracking server and display a prompt indicating the URL and port where the UI is accessible — usually http://localhost:5000 By default,

You can see the interaction in its entirety. Tracing section — from the user’s initial input to how the assistant routed the request to the appropriate agent, and finally, the response generated by that agent. This trace from beginning to end provides you with valuable insights into the decision-making process, how handoffs are made, and what outputs were generated. You can use this information to optimize agent workflows.
Tracing Guardrails (guardrails.py)
We implement in this example a guardrail protected customer support agent by using OpenAI Agents SDK along with MLflow tracking. This agent can only answer general questions, but not medical ones. The guardrail agent blocks requests if it detects such inputs. MLflow captures the entire flow — including guardrail activation, reasoning, and agent response — providing full traceability and insight into safety mechanisms.
import mlflow, asyncio
BaseModel : pydantic imported
Import agents (
Agent, Runner,
GuardrailFunctionOutput, InputGuardrailTripwireTriggered,
input_guardrail, RunContextWrapper)
From dotenv, import load_dotenv
load_dotenv()
mlflow.openai.autolog()
mlflow.set_tracking_uri("./mlruns")
mlflow.set_experiment("Agent‑Guardrails")
class MedicalSymptons(BaseModel):
Medical_symptoms bool
Why?
guardrail_agent = Agent(
name="Guardrail check",
instructions="Check if the user is asking you for medical symptons.",
output_type=MedicalSymptons,
)
@input_guardrail
async def medical_guardrail(
RunContextWrapper[None]. Agent: agent, input
) -> GuardrailFunctionOutput:
result = await Runner.run(guardrail_agent, input, context=ctx.context)
return GuardrailFunctionOutput(
output_info=result.final_output,
tripwire_triggered=result.final_output.medical_symptoms,
)
Agent = agent
name="Customer support agent",
instructions="You are a customer support agent. You help customers with their questions.",
input_guardrails=[medical_guardrail],
)
async def main():
try:
Runner.run (agent) "Should I take aspirin if I'm having a headache?")
print("Guardrail didn't trip - this is unexpected")
except InputGuardrailTripwireTriggered:
print("Medical guardrail tripped")
If the __name__ equals "__main__":
asyncio.run(main())
This script creates a Customer Support Agent with an Input Guardrail to detect medical questions. This script evaluates whether the input from the user is a request for advice on medical matters using a guardrail_agent. When such input is detected by the guardrail, it triggers and stops the main agent’s response. MLflow automatically logs and traces the entire process including all guardrail checks, outcomes and results.
MLFlow UI
Run the following command to open the MLflow UI, and see all the logged interactions with agents:

This is an example of a question we ask the agent. “Should I take aspirin if I’m having a headache?”The guardrail was triggered by the input. The MLflow UI clearly shows that input has been flagged along with the reason given by the guardrail agent as to why the request is blocked.
| Click here to find out more Codes. The researchers are the sole credit holders for this work. Ready to connect with 1 Million+ AI Devs/Engineers/Researchers? MarkTechPost helps NVIDIA, LG AI Research and other top AI companies reach their audience. [Learn More] |


