Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims
  • PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings
  • Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI
  • SERA is a Soft Verified Coding agent, built with only Supervised training for practical Repository level Automation Workflows.
  • I Let Google’s ‘Auto Browse’ AI Agent Take Over Chrome. It didn’t quite click
  • DeepSeek AI releases DeepSeek OCR 2 with Causal visual flow encoder for layout-aware document understanding
  • Microsoft unveils Maia 200: An AI Inference Accelerator Optimized for FP4 and F8 Datacenters
  • Code Deep Dive: Differentiable computer vision with Kornia using Geometry optimization, LoFTR matches, and GPU augmentation
AI-trends.todayAI-trends.today
Home»Tech»Open Sources Qualifire AI Rogue: A Framework for End-to end Agentic AI Testing to Assess the Reliability, Performance and Compliance of AI Agents

Open Sources Qualifire AI Rogue: A Framework for End-to end Agentic AI Testing to Assess the Reliability, Performance and Compliance of AI Agents

Tech By Gavin Wallace16/10/20256 Mins Read
Facebook Twitter LinkedIn Email
A Coding Implementation to Build an AI Agent with Live
A Coding Implementation to Build an AI Agent with Live
Share
Facebook Twitter LinkedIn Email

The agentic system is stochastic and context-dependent. Conventional QA—unit tests, static prompts, or scalar “LLM-as-a-judge” scores—fails to expose multi-turn vulnerabilities and provides weak audit trails. The developer teams must have protocol-accurate discussions, policy checks that are explicit, and evidence which can be read by machines.

Open-sourced software from Qualifire AI RogueA Python Framework that Evaluates AI Agents over Agent-to Agent (A2A). protocol. Rogue transforms business policies to executable scenarios. It drives multiple-turn interactions with a target agent and produces deterministic reports that are suitable for compliance and CI/CD reviews.

Quick Start

Prerequisites

  • uvx – If not installed, follow uv installation guide
  • Python 3.10+
  • An API Key for a LLM Provider (e.g. OpenAI, Google or Anthropic).

Installation

Options 1: Quick install (recommended)

You can install quickly by using the automated script:

TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI/CI/CD
uvx rogue-ai cli

Option 2: Manual Installation

“a” Clone the repository

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

(b) Install dependencies:

Use uv:

If you use pip

(c) Optionally: Configure your environment variables. Create a file called.env and place it in the root of your directory. Then add your API key. Rogue is based on LiteLLM so that you can use keys from different providers.

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

Running Rogue

Rogue works on behalf of a customer.server Architecture where core evaluation logic is run on a server in the backend, with various clients connecting to it via different interfaces.

Standard Behavior

You can run uvx Rogue-AI without specifying a mode.

  1. Rogue starts in the background
  2. Launches TUI (Terminal User Interface), client

There are several modes of operation.

  • Standard (Server plus TUI): uvx rogue-ai – Starts server in background + TUI client
  • You can also find out more about Server: uvx rogue-ai server – Runs only the backend server
  • TUI: uvx rogue-ai tui – Runs only the TUI client (requires server running)
  • Web User Interface: uvx rogue-ai ui – Runs only the Gradio web interface client (requires server running)
  • CLI: uvx rogue-ai cli – Runs non-interactive command-line evaluation (requires server running, ideal for CI/CD)

Modal Arguments

Server Mode
uvx rogue-ai server [OPTIONS]

Options:

  • –host HOST – Host to run the server on (default: 127.0.0.1 or HOST env var)
  • –port PORT – Port to run the server on (default: 8000 or PORT env var)
  • –debug – Enable debug logging

TUI Mode

uvx rogue-ai tui [OPTIONS]
Web UI mode
uvx rogue-ai ui [OPTIONS]

Options:

  • –rogue-server-url URL – Rogue server URL (default: http://localhost:8000)
  • –port PORT – Port to run the UI on
  • –workdir WORKDIR – Working directory (default: ./.rogue)
  • –debug – Enable debug logging

Test the T-Shirt Agent

This repository has a simple T-shirt selling agent. It allows you to watch Rogue at work.

Install the following example dependency:

Use uv when you use a uv

If you use pip

Install pip -e[examples]

Start an example server on a different terminal.

Use uv if you use a uv

uv run examples/tshirt_store_agent

Then:

python examples/tshirt_store_agent

This will start the agent on http://localhost:10001.

(b) Configuration Rogue In the user interface, you can point out the agent as an example:

  • Agent URL: http://localhost:10001
  • Authentication: no-auth

Run the assessment and monitor Rogue Check the policy of your T-Shirt Agent!

Use either TUI or Web UI.

What Rogue Can Do: Use Cases

  • Safety & Compliance HardeningTranscript-based evidence can be used to validate PII/PHI management, behavior of refusal, prevention of secret leaks, and policies governing regulated domains.
  • E-Commerce & Support AgentsUnder adverse and failure circumstances, ensure that discounts are only available with an OTP, rules for refunds, escalation based on SLA, and tools (order check, tickets) have the correct use.
  • Developer/DevOps agentsAssessment of code-mods and CLI pilots in terms of workspace restriction, rollback semantics (rollback behavior), rate-limit/backoff behaviour, and safe command prevention.
  • Multi-Agent Systems: Verify planner↔executor contracts, capability negotiation, and schema conformance over A2A; evaluate interoperability across heterogeneous frameworks.
  • Regression & Drift MonitoringNightly Suites to detect new models or changes in the model; detect behavior drift and enforce critical pass criteria for policy before release.

What Exactly Is Rogue—and Why Should Agent Dev Teams Care?

Rogue This is a comprehensive testing framework that evaluates the reliability, performance and compliance of AI agents. Rogue The EvaluatorAgent runs protocol correct conversations in fast single turn or deep multi-turn adversarial modes. EvaluatorAgent performs protocol-correct conversation in either a single turn adversarial or multiturn adversarial speed mode. You can bring your own model or have it made. Rogue Use Qualifire’s SLM Judges to run the test. Streaming observability and deterministic artifacts: live transcripts,pass/fail verdicts, rationales tied to transcript spans, timing and model/version lineage.

Rogue’s Under-the-hood: A Look at the Construction of Rogue

Rogue uses a client/server architecture.

  • Rogue Server: Includes the core evaluation logic
  • Interfaces for ClientsMultiple interfaces to connect with the server
    • TUI The modern terminal interface is built using Go and Bubble tea
    • Web UIGradio based web interface
    • CLICommand-line Interface for Automated Evaluation and CI/CD

The architecture is flexible and allows multiple users to connect simultaneously and run the server independently.

The following is a summary of the information that you will find on this page.

Rogue Helps developer teams to test agent behavior in the real-world environment. It converts written policies into realistic scenarios and then exercises them over A2A. Transcripts are recorded to show you what occurred. This produces a repeatable, clear signal that you can use to detect policy regressions and breaks before the product is shipped.


Thanks to the Qualifire team for the thought leadership/ Resources for this article. This article/content has been supported by the Qualifire Team.


Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

AI fire open source work
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

31/01/2026

Robbyant LingBot World – a Real Time World Model of Interactive Simulations and Embodied AI

31/01/2026

SERA is a Soft Verified Coding agent, built with only Supervised training for practical Repository level Automation Workflows.

30/01/2026

DeepSeek AI releases DeepSeek OCR 2 with Causal visual flow encoder for layout-aware document understanding

30/01/2026
Top News

AI Agents are too cheap for our own good

Meta’s AI Recruiting Campaign Discovers a new Target

Marissa Mayer dissolves her Sunshine Startup Lab

OpenAI Re-acquires Two Thinking Machines Lab cofounders

Google Wants to Get Better at Spotting Wildfires From Space

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

Building high-performance financial analytics pipelines with Polars : Lazy evaluation, advanced expressions, and SQL integration

18/06/2025

Zhipu AI releases GLM-4.6 to achieve enhancements in real-world coding, long-context processing, reasoning, searching, and agentic AI

01/10/2025
Latest News

Jeffrey Epstein Had a ‘Personal Hacker,’ Informant Claims

31/01/2026

PyKEEN: Coding for Training, Optimizing and Evaluating Knowledge Graph Embeddings

31/01/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.