The agentic system is stochastic and context-dependent. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. The developer teams must have protocol-accurate discussions, policy checks that are explicit, and evidence which can be read by machines.
Qualifire AI is open-sourced RogueA Python Framework that Evaluates AI Agents over Agent-to Agent (A2A). protocol. Rogue turns business policies into scenarios that can be executed, it drives interactions in multiple directions against the target agent. Rogue also produces reports for CI/CD, compliance, and other reviews.
Quick Start
Prerequisites
- uvx – If not installed, follow uv installation guide
- Python 3.10+
- An API Key for an LLM Provider (e.g. OpenAI, Google or Anthropic).
Installation
Choose Option 1 for Quick Installation
Install quickly using our scripted installation:
TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI/CI/CD
uvx rogue-ai cli
Option 2: Manual Installation
The repository can be cloned:
git clone https://github.com/qualifire-dev/rogue.git
cd rogue
(b) Install dependencies:
Use uv:
You can use the pip command to find out if your system is using this.
Setup your environment variables – Optionally: Create a file called.env and place your API key in it. Rogue makes use of LiteLLM. You can therefore set different keys to suit various providers.
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."
Running Rogue
Rogue works on behalf of a customer.server Architecture where core evaluation logic is run on a server in the backend, with various clients connecting to it via different interfaces.
The default behavior
If you do not specify a mode when running uvx-rogue AI, the following happens:
- Rogue starts in the background
- Launches TUI client (Terminal User Interface).
There are several modes of operation.
- Standard (Server plus TUI): uvx rogue-ai – Starts server in background + TUI client
- Can you please explain?: uvx rogue-ai server – Runs only the backend server
- TUI: uvx rogue-ai tui – Runs only the TUI client (requires server running)
- Web Interface: uvx rogue-ai ui – Runs only the Gradio web interface client (requires server running)
- CLI: uvx rogue-ai cli – Runs non-interactive command-line evaluation (requires server running, ideal for CI/CD)
Modal Arguments
Server Mode
uvx rogue-ai server [OPTIONS]
Options:
- –host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
- –port PORT – Port to run the server on (default: 8000 or PORT env var)
- –debug – Enable debug logging
TUI Style
uvx rogue-ai tui [OPTIONS]
Web UI mode
uvx rogue-ai ui [OPTIONS]
Options:
- –rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
- –port PORT – Port to run the UI on
- –workdir WORKDIR – Working directory (default: ./.rogue)
- –debug – Enable debug logging
Examples: Test of the T-Shirt Agent
The repository contains a simple agent example that sells t-shirts. Use it to view Rogue’s actions.
Install the following example dependency:
You are using the uv
If you’re using Pip
Pip Install -e[examples]
Launch the agent in an additional terminal.
You are using the uv
uv run examples/tshirt_store_agent
Then:
python examples/tshirt_store_agent
This will start the agent on http://localhost:10001.
b) Configuration Rogue In the user interface, you can point out the agent as an example:
- Agent URL: http://localhost:10001
- Authentication: no-auth
Run the evaluation, and then watch Rogue Check the policy of your T-Shirt Agent!
Use either TUI or Web UI.
How Rogue fits into Practical Use Cases
- Safety & Compliance HardeningTranscripts can provide evidence to support policies on PII/PHI, refusing behavior, preventing leaks of secrets, and the regulated domain.
- E-Commerce & Support AgentsUnder adverse and failure circumstances, ensure that discounts are only available with an OTP, rules for refunds, escalation based on SLAs, and tools (order check, tickets) have the correct use.
- Agents for Developer/DevOpsTest code-mod copilots and their CLIs to determine if they are confined in a workspace, if rollback is semantically correct, based on rate limit/backoff behaviors, or preventing unsafe commands.
- Multi-Agent Systems: Verify planner↔executor contracts, capability negotiation, and schema conformance over A2A; evaluate interoperability across heterogeneous frameworks.
- Regression & Drift MonitoringNightly Suites to detect new models or changes in the model; detect behavior drift and enforce critical pass criteria for policy before release.
What Exactly Is Rogue—and Why Should Agent Dev Teams Care?
Rogue This is a comprehensive testing framework that evaluates the reliability, performance and compliance of AI agents. Rogue Synthesizes context, risk and business into tests that have clear goals, tactics and criteria for success. EvaluatorAgent can run protocol-correct conversations either in a fast one turn mode or a deep adversarial mode. You can bring your own model or have it made. Rogue Tests can be driven by the SLM judge of Qualifire. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.
Rogue: The Inside Story
Rogue uses a client/server architecture.
- Rogue Server: Includes the core evaluation logic
- Client InterfacesMultiple interfaces connecting to the server
- TUI Terminal UI: A modern interface for terminals built with Bubble Tea and Go
- Web UIGradio-based Web interface
- CLI: Command line interface for automated evaluation, CI/CD
The architecture is flexible and allows multiple users to connect simultaneously and run the server independently.
You can read more about it here:
Rogue Helps developer teams to test agent behavior in the real-world environment. Written policies are turned into scenarios that can be tested over A2A. The transcripts of the tests show what actually happened. It produces a signal which you can repeat in CI/CD and use to identify policy breaches or regressions.
Thanks to the Qualifire team for the thought leadership/ Resources for this article. This article/content has been supported by the Qualifire Team.
Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence’s potential to benefit society. Marktechpost was his most recent venture. This platform, dedicated to Artificial Intelligence, is known for the in-depth reporting of news on machine learning and deep understanding that is both technical and understandable. Over 2 million views per month are a testament to the platform’s popularity.

