The agentic system is stochastic and context-dependent. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. The developer teams must have protocol-accurate discussions, policy checks that are explicit, and evidence which can be read by machines.
Open-sourced software from Qualifire AI RogueA Python Framework that Evaluates AI Agents over Agent-to Agent (A2A). protocol. Rogue transforms business policies to executable scenarios. It drives multiple-turn interactions with a target agent and produces deterministic reports that are suitable for compliance and CI/CD reviews.
Quick Start
Prerequisites
- uvx – If not installed, follow uv installation guide
- Python 3.10+
- An API Key for a LLM Provider (e.g. OpenAI, Google or Anthropic).
Installation
Options 1: Quick install (recommended)
You can install quickly by using the automated script:
TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI/CI/CD
uvx rogue-ai cli
Option 2: Manual Installation
“a” Clone the repository
git clone https://github.com/qualifire-dev/rogue.git
cd rogue
(b) Install dependencies:
Use uv:
If you use pip
(c) Optionally: Configure your environment variables. Create a file called.env and place it in the root of your directory. Then add your API key. Rogue is based on LiteLLM so that you can use keys from different providers.
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."
Running Rogue
Rogue works on behalf of a customer.server Architecture where core evaluation logic is run on a server in the backend, with various clients connecting to it via different interfaces.
Standard Behavior
You can run uvx Rogue-AI without specifying a mode.
- Rogue starts in the background
- Launches TUI (Terminal User Interface), client
There are several modes of operation.
- Standard (Server plus TUI): uvx rogue-ai – Starts server in background + TUI client
- You can also find out more about Server: uvx rogue-ai server – Runs only the backend server
- TUI: uvx rogue-ai tui – Runs only the TUI client (requires server running)
- Web User Interface: uvx rogue-ai ui – Runs only the Gradio web interface client (requires server running)
- CLI: uvx rogue-ai cli – Runs non-interactive command-line evaluation (requires server running, ideal for CI/CD)
Modal Arguments
Server Mode
uvx rogue-ai server [OPTIONS]
Options:
- –host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
- –port PORT – Port to run the server on (default: 8000 or PORT env var)
- –debug – Enable debug logging
TUI Mode
uvx rogue-ai tui [OPTIONS]
Web UI mode
uvx rogue-ai ui [OPTIONS]
Options:
- –rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
- –port PORT – Port to run the UI on
- –workdir WORKDIR – Working directory (default: ./.rogue)
- –debug – Enable debug logging
Test the T-Shirt Agent
This repository has a simple T-shirt selling agent. It allows you to watch Rogue at work.
Install the following example dependency:
Use uv when you use a uv
If you use pip
Install pip -e[examples]
Start an example server on a different terminal.
Use uv if you use a uv
uv run examples/tshirt_store_agent
Then:
python examples/tshirt_store_agent
This will start the agent on http://localhost:10001.
(b) Configuration Rogue In the user interface, you can point out the agent as an example:
- Agent URL: http://localhost:10001
- Authentication: no-auth
Run the assessment and monitor Rogue Check the policy of your T-Shirt Agent!
Use either TUI or Web UI.
What Rogue Can Do: Use Cases
- Safety & Compliance HardeningTranscript-based evidence can be used to validate PII/PHI management, behavior of refusal, prevention of secret leaks, and policies governing regulated domains.
- E-Commerce & Support AgentsUnder adverse and failure circumstances, ensure that discounts are only available with an OTP, rules for refunds, escalation based on SLA, and tools (order check, tickets) have the correct use.
- Developer/DevOps agentsAssessment of code-mods and CLI pilots in terms of workspace restriction, rollback semantics (rollback behavior), rate-limit/backoff behaviour, and safe command prevention.
- Multi-Agent Systems: Verify planner↔executor contracts, capability negotiation, and schema conformance over A2A; evaluate interoperability across heterogeneous frameworks.
- Regression & Drift MonitoringNightly Suites to detect new models or changes in the model; detect behavior drift and enforce critical pass criteria for policy before release.
What Exactly Is Rogue—and Why Should Agent Dev Teams Care?
Rogue This is a comprehensive testing framework that evaluates the reliability, performance and compliance of AI agents. Rogue The EvaluatorAgent runs protocol correct conversations in fast single turn or deep multi-turn adversarial modes. EvaluatorAgent performs protocol-correct conversation in either a single turn adversarial or multiturn adversarial speed mode. You can bring your own model or have it made. Rogue Use Qualifire’s SLM Judges to run the test. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.
Rogue’s Under-the-hood: A Look at the Construction of Rogue
Rogue uses a client/server architecture.
- Rogue Server: Includes the core evaluation logic
- Interfaces for ClientsMultiple interfaces to connect with the server
- TUI The modern terminal interface is built using Go and Bubble tea
- Web UIGradio based web interface
- CLICommand-line Interface for Automated Evaluation and CI/CD
The architecture is flexible and allows multiple users to connect simultaneously and run the server independently.
The following is a summary of the information that you will find on this page.
Rogue Helps developer teams to test agent behavior in the real-world environment. It converts written policies into realistic scenarios and then exercises them over A2A. Transcripts are recorded to show you what occurred. This produces a repeatable, clear signal that you can use to detect policy regressions and breaks before the product is shipped.
Thanks to the Qualifire team for the thought leadership/ Resources for this article. This article/content has been supported by the Qualifire Team.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

