Open Sources Qualifire AI Rogue: A Framework for End-to end Agentic AI Testing to Assess the Reliability, Performance and Compliance of AI Agents

The agentic system is stochastic and context-dependent. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. The developer teams must have protocol-accurate discussions, policy checks that are explicit, and evidence which can be read by machines.

Open-sourced software from Qualifire AI RogueA Python Framework that Evaluates AI Agents over Agent-to Agent (A2A). protocol. Rogue transforms business policies to executable scenarios. It drives multiple-turn interactions with a target agent and produces deterministic reports that are suitable for compliance and CI/CD reviews.

Quick Start

Prerequisites

uvx – If not installed, follow uv installation guide
Python 3.10+
An API Key for a LLM Provider (e.g. OpenAI, Google or Anthropic).

Installation

Options 1: Quick install (recommended)

You can install quickly by using the automated script:

TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI/CI/CD
uvx rogue-ai cli

Option 2: Manual Installation

“a” Clone the repository

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

(b) Install dependencies:

Use uv:

If you use pip

(c) Optionally: Configure your environment variables. Create a file called.env and place it in the root of your directory. Then add your API key. Rogue is based on LiteLLM so that you can use keys from different providers.

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

Running Rogue

Rogue works on behalf of a customer.server Architecture where core evaluation logic is run on a server in the backend, with various clients connecting to it via different interfaces.

Standard Behavior

You can run uvx Rogue-AI without specifying a mode.

Rogue starts in the background
Launches TUI (Terminal User Interface), client

There are several modes of operation.

Standard (Server plus TUI): uvx rogue-ai – Starts server in background + TUI client
You can also find out more about Server: uvx rogue-ai server – Runs only the backend server
TUI: uvx rogue-ai tui – Runs only the TUI client (requires server running)
Web User Interface: uvx rogue-ai ui – Runs only the Gradio web interface client (requires server running)
CLI: uvx rogue-ai cli – Runs non-interactive command-line evaluation (requires server running, ideal for CI/CD)

Modal Arguments

Server Mode

uvx rogue-ai server [OPTIONS]

Options:

–host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
–port PORT – Port to run the server on (default: 8000 or PORT env var)
–debug – Enable debug logging

TUI Mode

uvx rogue-ai tui [OPTIONS]
Web UI mode
uvx rogue-ai ui [OPTIONS]

Options:

–rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
–port PORT – Port to run the UI on
–workdir WORKDIR – Working directory (default: ./.rogue)
–debug – Enable debug logging

Test the T-Shirt Agent

This repository has a simple T-shirt selling agent. It allows you to watch Rogue at work.

Install the following example dependency:

Use uv when you use a uv

If you use pip

Install pip -e[examples]

Start an example server on a different terminal.

Use uv if you use a uv

uv run examples/tshirt_store_agent

Then:

python examples/tshirt_store_agent

This will start the agent on http://localhost:10001.

(b) Configuration Rogue In the user interface, you can point out the agent as an example:

Agent URL: http://localhost:10001
Authentication: no-auth

Run the assessment and monitor Rogue Check the policy of your T-Shirt Agent!

Use either TUI or Web UI.

What Rogue Can Do: Use Cases

Safety & Compliance HardeningTranscript-based evidence can be used to validate PII/PHI management, behavior of refusal, prevention of secret leaks, and policies governing regulated domains.
E-Commerce & Support AgentsUnder adverse and failure circumstances, ensure that discounts are only available with an OTP, rules for refunds, escalation based on SLA, and tools (order check, tickets) have the correct use.
Developer/DevOps agentsAssessment of code-mods and CLI pilots in terms of workspace restriction, rollback semantics (rollback behavior), rate-limit/backoff behaviour, and safe command prevention.
Multi-Agent Systems: Verify planner↔executor contracts, capability negotiation, and schema conformance over A2A; evaluate interoperability across heterogeneous frameworks.
Regression & Drift MonitoringNightly Suites to detect new models or changes in the model; detect behavior drift and enforce critical pass criteria for policy before release.

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Rogue This is a comprehensive testing framework that evaluates the reliability, performance and compliance of AI agents. Rogue The EvaluatorAgent runs protocol correct conversations in fast single turn or deep multi-turn adversarial modes. EvaluatorAgent performs protocol-correct conversation in either a single turn adversarial or multiturn adversarial speed mode. You can bring your own model or have it made. Rogue Use Qualifire’s SLM Judges to run the test. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.

Rogue’s Under-the-hood: A Look at the Construction of Rogue

Rogue uses a client/server architecture.

Rogue Server: Includes the core evaluation logic
Interfaces for ClientsMultiple interfaces to connect with the server
- TUI The modern terminal interface is built using Go and Bubble tea
- Web UIGradio based web interface
- CLICommand-line Interface for Automated Evaluation and CI/CD

The architecture is flexible and allows multiple users to connect simultaneously and run the server independently.

The following is a summary of the information that you will find on this page.

Rogue Helps developer teams to test agent behavior in the real-world environment. It converts written policies into realistic scenarios and then exercises them over A2A. Transcripts are recorded to show you what occurred. This produces a repeatable, clear signal that you can use to detect policy regressions and breaks before the product is shipped.

Thanks to the Qualifire team for the thought leadership/ Resources for this article. This article/content has been supported by the Qualifire Team.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

Open Sources Qualifire AI Rogue: A Framework for End-to end Agentic AI Testing to Assess the Reliability, Performance and Compliance of AI Agents

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI

SERA is a Soft Verified Coding agent, built with only Supervised training for practical Repository level Automation Workflows.

DeepSeek AI releases DeepSeek OCR 2 with Causal visual flow encoder for layout-aware document understanding

AI Agents are too cheap for our own good

Meta’s AI Recruiting Campaign Discovers a new Target

Marissa Mayer dissolves her Sunshine Startup Lab

OpenAI Re-acquires Two Thinking Machines Lab cofounders

Google Wants to Get Better at Spotting Wildfires From Space

Top Insights

Building high-performance financial analytics pipelines with Polars : Lazy evaluation, advanced expressions, and SQL integration

Zhipu AI releases GLM-4.6 to achieve enhancements in real-world coding, long-context processing, reasoning, searching, and agentic AI

Latest News

Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

Open Sources Qualifire AI Rogue: A Framework for End-to end Agentic AI Testing to Assess the Reliability, Performance and Compliance of AI Agents

Quick Start

Prerequisites

Installation

Options 1: Quick install (recommended)

Option 2: Manual Installation

Running Rogue

Standard Behavior

There are several modes of operation.

Modal Arguments

Server Mode

What Rogue Can Do: Use Cases

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Rogue’s Under-the-hood: A Look at the Construction of Rogue

The following is a summary of the information that you will find on this page.

Related Posts