Researchers from Tsinghua and Ant Group Unveil Five Layers of Lifecycle Oriented Frameworks to Mitigate Autonomous LLM agent Vulnerabilities on OpenClaw

OpenClaw and other autonomous LLM agents are changing the paradigm of passive assistants into proactive entities that can execute complex long-horizon tasks with high privilege system access. A research study from the Security Analysis Research Group has shown that a report on security analyses by Tsinghua University and Ant Group reveals that OpenClaw’s ‘kernel-plugin’ architecture—anchored by a pi-coding-agent serving as the Minimal Trusted Computing Base (TCB)—is vulnerable to multi-stage systemic risks that bypass traditional, isolated defenses. Researchers demonstrate how threats, such as skill supply-chain contamination or memory poisoning, can affect an agent’s whole operational trajectory by introducing a 5-layer framework that includes initialization and input.

OpenClaw Architecture : The TCB and pi-coding agents

OpenClaw utilizes a ‘kernel-plugin’ architecture that separates core logic from extensible functionality. The system’s Trusted Computing Base (TCB) Definition of a pi-coding-agentThis minimal core is responsible for managing memory, planning tasks, and orchestrating execution. This TCB manages an extensible ecosystem of third-party plugins—or ‘skills’—that enable the agent to perform high-privilege operations such as automated software engineering and system administration. The research team identified a critical vulnerability in the architecture of the plugins that dynamically loads them without integrity checks. This creates a trust boundary with ambiguous boundaries and increases the attack surface.

Table 1: Threats across the lifecycle of OpenClaw and their corresponding protections “Lobster”
✓ The protection layer is effective in mitigating risk.
× Denotes uncovered risks by the protection layer

A Threat Taxonomy Based on Lifecycle

The threat landscape is organized into five stages, which correspond to the operational pipeline for the agents.

Stage I (Initialization): By loading security configurations and plugins as well as system prompts and settings, the agent creates its operating environment.
Stage II (Inputs) Ingesting multi-modal data requires the agent to distinguish between user instructions that are trusted and external data sources which cannot be trusted.
Stage III: This process uses techniques that include Chain-of-Thought (CoT) The prompting is done while the context is being maintained and external knowledge can be retrieved via retrieval-augmented creation.
This is the final stage (decision). The agent chooses the appropriate tools, and creates execution parameters using planning frameworks. React.
Stage V: To manage the operations, high-level plans must be converted into system actions that are privileged.

This approach demonstrates that the risks posed by autonomous agents are multi-stage and systemic, extending beyond single prompt injection attacks.

The Case for Agent Compromise: A Technical Study

1. The Initial Stage of Skill Poisoning

Before a mission begins, the skill poisoning is targeted at an agent. The capability routing interface can be abused by adversaries to introduce malicious skills.

The attack: This was demonstrated by the research team, who forced OpenClaw into creating a skill called hacked weather.
Mechanism: The attacker, by manipulating metadata of the skill, artificially raised its priority above the legitmate weather tool.
Impact: As soon as a user asked for weather information, the agent switched to the fake service, resulting in output controlled by the attacker.
Prevalence: The research report cites an empirical audit that found 26% of community-contributed tools There are security flaws.

Figure 2: Poisoning Command Inducing the Compromised “Lobster” to Generate a Malicious Weather Skill and Elevate Its Priority

Figure 3. Malicious skill generated by compromised “Lobster” — Structurally Valid Yet Semantically Subverts Legitimate Weather Functionality

Figure 4: Normal Weather Request Hijacked by Malicious Skill — Compromised “Lobster” Generates Attacker-Controlled Output

2. Indirect Prompt Injection (Input Stage)

Ingesting untrusted data makes autonomous agents vulnerable to zero click exploits.

The attack: The attackers embed malignant directives in external content such as web pages.
Mechanism: The embedded payload is overridden by the initial objective when the agent fetches the page in order to satisfy a request from the user.
Result: In one test, the agent ignored the user’s task to output a fixed ‘Hello World’ string mandated by the malicious site.

Figure 5: Malicious commands disguised as benign content on a website designed by an attacker

Figure 6: Compromised “Lobster” Executes Embedded Commands When Accessing Webpage — Generates Attacker-Controlled Content Instead of Fulfilling User Requests

3. Memory Poisoning – Inference Stage

OpenClaw is susceptible to behavioral manipulation over a long period of time because it maintains an persistent state.

Mechanism: The attacker modifies the file MEMORY.md by using a transient injector.
The attack: A fabricated rule was added instructing the agent to refuse any query containing the term ‘C++’.
Impact: This ‘poison’ persisted across sessions; subsequent benign requests for C++ programming were rejected by the agent, even after the initial attack interaction had ended.

Figure 7: Attacker Appends Forged Rules to Compromised “Lobster”‘s Persistent Memory — Converts Transient Attack Inputs into Long-Term Behavioral Contro

Figure 8: Compromised “Lobster” Rejects Benign C++ Programming Requests After Malicious Rule Storage — Adheres to Attacker-Defined Behaviors Overriding User Intent

4. The Intent Drift at the Decision Stage

Intent drift is when the sequence of tool requests that are locally justified leads to global destruction.

The Scenario A user issued a diagnostic request to eliminate a ‘suspicious crawler IP’.
This is called the Escalation. It attempted to change the firewall system via iptables after identifying IP addresses.
The system fails: The agent tried to manually restart the process after several unsuccessful attempts at modifying configuration files located outside of its workspace. The WebUI was rendered inaccessible, resulting in an outage of the entire system.

Figure 9: Compromised “Lobster” Deviates from Crawler IP Resolution Task Upon User Command — Executes Self-Termination Protocol Overriding Operational Objectives

5. High-Risk Order Execution Stage (Execution)

The attack is the culmination of earlier compromises that have been translated into a concrete impact on system.

The attack: The attacker decomposed the body Fork Bomb Attack divided into four separate benign file-writing steps.
Mechanism: The attacker created a latent chain of execution in trigger.sh using Base64 encoding, sed for removing junk characters and the sed command.
Impact: The script, once activated, caused the CPU to surge up near 100%, launching an effective denial-of service attack on the infrastructure.

Figure 10: Attacker Initiates Sequential Command Injection Through File Write Operations — Establishes Covert Execution Foothold in System Scheduler

Figure 11: Attacker Triggers Compromised “Lobster” to Execute Malicious Payload — Induces System Paralysis Leading to Critical Infrastructure Implosion

Figure 12: Compromised “Lobster” Triggers Host Server Resource Exhaustion Surge — Implements Stealthy Denial-of-Service Siege Against Critical Computing Backbone

Five-Layer Defense Architecture

Researchers evaluated existing defenses ‘fragmented’ point solutions and proposed a holistic, lifecycle-aware architecture.

Foundational Base Layer:

Establishes verifiable trust roots during the initial phase. It makes use of Analysis of Static and Dynamic Systems (ASTs) to detect unauthorized code and SBOMs are cryptographic signatures To verify the provenance of a skill.

2) Input Perception Layer

As a portal, it prevents external data hijacking of the control flow. This enforces an Order Hierarchy By using cryptographic tokens, you can prioritize the developer’s prompts above untrusted external content.

Three-layer Cognitive State:

It protects the internal memory from being corrupted. The use of a specialized technique. Merkle-tree Structures State snapshotting and Rollbacks is a good example. Cross-encoders To measure semantic distance, and to detect context drift.

The (4) decision Alignment layer:

Before taking any action, ensure that the synthesized plan aligns with the user’s objectives. The plan includes Formal Verification Use symbolic solvers in order to verify that the proposed sequences don’t violate safety Invariants.

The 5) Execution Control layer:

Serves as the final enforcement boundary using an ‘assume breach’ paradigm. This provides isolation by using Sandboxing at the Kernel Level utilizing eBPF You can also find out more about the following: seccomp To intercept unauthorised system calls on the OS level

The Key Takeaways

The attack surface of autonomous agents is expanded by high-privilege execution, persistent memory and the use of persistent memory. OpenClaw, unlike stateless LLM apps, relies on long-term memories and cross-system integration to perform complex tasks with a long-term horizon. The proactive nature of OpenClaw introduces multi-stage, systemic risks across the whole operational lifecycle.
The supply chain of Skill Ecosystems is at risk. About 26% of community-contributed tools Agent skill ecologies have security flaws. Attackers can use ‘skill poisoning’ to inject malicious tools that appear legitimate but contain hidden priority overrides, allowing them to silently hijack user requests and produce attacker-controlled outputs.
Memory can be a dangerous and persistent attack vector. Through persistent memory, adversarial inputs can be turned into long-term behavior control. Memory poisoning allows an attacker to implant fake policy rules in an agent’s memories (e.g. MEMORY.md) causing it to reject innocent requests long after the attack has finished.
Ambiguous instructions lead to destructive ‘Intent Drift.’ Agents can still experience intention drift even without malicious manipulation. A series of locally justified tool calls may lead to global destructive results. Basic diagnostic security requests have escalated to unauthorized firewall changes and service terminations, rendering the system unusable.
For effective protection, you need a defense-indepth architecture which is lifecycle aware. Existing point-based defenses—such as simple input filters—are insufficient against cross-temporal, multi-stage attacks. The agent’s lifecycle must include all five layers for a robust defense. Basis for Foundational Knowledge (plugin vetting), The Perception of Input (instruction hierarchy), Cognitive State (memory integrity), Aligning Decisions Verification of the plan Execution control (kernel-level sandboxing via eBPF).

Check out Paper. Also, feel free to follow us on Twitter Don’t forget about our 120k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

_{Ant Research has provided this article as a support and resource.}

Researchers from Tsinghua and Ant Group Unveil Five Layers of Lifecycle Oriented Frameworks to Mitigate Autonomous LLM agent Vulnerabilities on OpenClaw

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

ChatGPT will soon have ads. Advertisements are Coming to ChatGPT.

Mira Murati’s Stealth AI Lab launches its first product

“Create a replica of this image. Don’t change anything” AI Trend Takes Off

What is the mysterious metallic device that US Chief Design officer Joe Gebbia is using?

Elon Musk Reveals Grok 4, Amid Controversy over Chatbot Antisemitic Comments

Top Insights

Politico’s Newsroom Is Starting a Legal Battle With Management Over AI

This AI Paper introduces MMSearch R1: A Reinforcement-Learning Framework for On-Demand, Multimodal Search with LMMs.

Latest News

xAI Releases Standalone Grok Speech to text and Text to speech APIs, Aimed at Enterprise Voice Developers

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

Researchers from Tsinghua and Ant Group Unveil Five Layers of Lifecycle Oriented Frameworks to Mitigate Autonomous LLM agent Vulnerabilities on OpenClaw

OpenClaw Architecture : The TCB and pi-coding agents

A Threat Taxonomy Based on Lifecycle

The Case for Agent Compromise: A Technical Study

1. The Initial Stage of Skill Poisoning

2. Indirect Prompt Injection (Input Stage)

3. Memory Poisoning – Inference Stage

4. The Intent Drift at the Decision Stage

5. High-Risk Order Execution Stage (Execution)

Five-Layer Defense Architecture

Foundational Base Layer:

2) Input Perception Layer

Three-layer Cognitive State:

The (4) decision Alignment layer:

The 5) Execution Control layer:

The Key Takeaways

Related Posts