OpenClaw and other autonomous LLM agents are changing the paradigm of passive assistants into proactive entities that can execute complex long-horizon tasks with high privilege system access. A research study from the Security Analysis Research Group has shown that a report on security analyses by Tsinghua University and Ant Group reveals that OpenClaw’s ‘kernel-plugin’ architecture—anchored by a pi-coding-agent serving as the Minimal Trusted Computing Base (TCB)—is vulnerable to multi-stage systemic risks that bypass traditional, isolated defenses. Researchers demonstrate how threats, such as skill supply-chain contamination or memory poisoning, can affect an agent’s whole operational trajectory by introducing a 5-layer framework that includes initialization and input.
OpenClaw Architecture : The TCB and pi-coding agents
OpenClaw utilizes a ‘kernel-plugin’ architecture that separates core logic from extensible functionality. The system’s Trusted Computing Base (TCB) Definition of a pi-coding-agentThis minimal core is responsible for managing memory, planning tasks, and orchestrating execution. This TCB manages an extensible ecosystem of third-party plugins—or ‘skills’—that enable the agent to perform high-privilege operations such as automated software engineering and system administration. The research team identified a critical vulnerability in the architecture of the plugins that dynamically loads them without integrity checks. This creates a trust boundary with ambiguous boundaries and increases the attack surface.
✓ The protection layer is effective in mitigating risk.
× Denotes uncovered risks by the protection layer
A Threat Taxonomy Based on Lifecycle
The threat landscape is organized into five stages, which correspond to the operational pipeline for the agents.
- Stage I (Initialization): By loading security configurations and plugins as well as system prompts and settings, the agent creates its operating environment.
- Stage II (Inputs) Ingesting multi-modal data requires the agent to distinguish between user instructions that are trusted and external data sources which cannot be trusted.
- Stage III: This process uses techniques that include Chain-of-Thought (CoT) The prompting is done while the context is being maintained and external knowledge can be retrieved via retrieval-augmented creation.
- This is the final stage (decision). The agent chooses the appropriate tools, and creates execution parameters using planning frameworks. React.
- Stage V: To manage the operations, high-level plans must be converted into system actions that are privileged.
This approach demonstrates that the risks posed by autonomous agents are multi-stage and systemic, extending beyond single prompt injection attacks.
The Case for Agent Compromise: A Technical Study
1. The Initial Stage of Skill Poisoning
Before a mission begins, the skill poisoning is targeted at an agent. The capability routing interface can be abused by adversaries to introduce malicious skills.
- The attack: This was demonstrated by the research team, who forced OpenClaw into creating a skill called hacked weather.
- Mechanism: The attacker, by manipulating metadata of the skill, artificially raised its priority above the legitmate weather tool.
- Impact: As soon as a user asked for weather information, the agent switched to the fake service, resulting in output controlled by the attacker.
- Prevalence: The research report cites an empirical audit that found 26% of community-contributed tools There are security flaws.



2. Indirect Prompt Injection (Input Stage)
Ingesting untrusted data makes autonomous agents vulnerable to zero click exploits.
- The attack: The attackers embed malignant directives in external content such as web pages.
- Mechanism: The embedded payload is overridden by the initial objective when the agent fetches the page in order to satisfy a request from the user.
- Result: In one test, the agent ignored the user’s task to output a fixed ‘Hello World’ string mandated by the malicious site.


3. Memory Poisoning – Inference Stage
OpenClaw is susceptible to behavioral manipulation over a long period of time because it maintains an persistent state.
- Mechanism: The attacker modifies the file MEMORY.md by using a transient injector.
- The attack: A fabricated rule was added instructing the agent to refuse any query containing the term ‘C++’.
- Impact: This ‘poison’ persisted across sessions; subsequent benign requests for C++ programming were rejected by the agent, even after the initial attack interaction had ended.


4. The Intent Drift at the Decision Stage
Intent drift is when the sequence of tool requests that are locally justified leads to global destruction.
- The Scenario A user issued a diagnostic request to eliminate a ‘suspicious crawler IP’.
- This is called the Escalation. It attempted to change the firewall system via iptables after identifying IP addresses.
- The system fails: The agent tried to manually restart the process after several unsuccessful attempts at modifying configuration files located outside of its workspace. The WebUI was rendered inaccessible, resulting in an outage of the entire system.

5. High-Risk Order Execution Stage (Execution)
The attack is the culmination of earlier compromises that have been translated into a concrete impact on system.
- The attack: The attacker decomposed the body Fork Bomb Attack divided into four separate benign file-writing steps.
- Mechanism: The attacker created a latent chain of execution in trigger.sh using Base64 encoding, sed for removing junk characters and the sed command.
- Impact: The script, once activated, caused the CPU to surge up near 100%, launching an effective denial-of service attack on the infrastructure.



Five-Layer Defense Architecture
Researchers evaluated existing defenses ‘fragmented’ point solutions and proposed a holistic, lifecycle-aware architecture.

Foundational Base Layer:
Establishes verifiable trust roots during the initial phase. It makes use of Analysis of Static and Dynamic Systems (ASTs) to detect unauthorized code and SBOMs are cryptographic signatures To verify the provenance of a skill.
2) Input Perception Layer
As a portal, it prevents external data hijacking of the control flow. This enforces an Order Hierarchy By using cryptographic tokens, you can prioritize the developer’s prompts above untrusted external content.
Three-layer Cognitive State:
It protects the internal memory from being corrupted. The use of a specialized technique. Merkle-tree Structures State snapshotting and Rollbacks is a good example. Cross-encoders To measure semantic distance, and to detect context drift.
The (4) decision Alignment layer:
Before taking any action, ensure that the synthesized plan aligns with the user’s objectives. The plan includes Formal Verification Use symbolic solvers in order to verify that the proposed sequences don’t violate safety Invariants.
The 5) Execution Control layer:
Serves as the final enforcement boundary using an ‘assume breach’ paradigm. This provides isolation by using Sandboxing at the Kernel Level utilizing eBPF You can also find out more about the following: seccomp To intercept unauthorised system calls on the OS level
The Key Takeaways
- The attack surface of autonomous agents is expanded by high-privilege execution, persistent memory and the use of persistent memory. OpenClaw, unlike stateless LLM apps, relies on long-term memories and cross-system integration to perform complex tasks with a long-term horizon. The proactive nature of OpenClaw introduces multi-stage, systemic risks across the whole operational lifecycle.
- The supply chain of Skill Ecosystems is at risk. About 26% of community-contributed tools Agent skill ecologies have security flaws. Attackers can use ‘skill poisoning’ to inject malicious tools that appear legitimate but contain hidden priority overrides, allowing them to silently hijack user requests and produce attacker-controlled outputs.
- Memory can be a dangerous and persistent attack vector. Through persistent memory, adversarial inputs can be turned into long-term behavior control. Memory poisoning allows an attacker to implant fake policy rules in an agent’s memories (e.g. MEMORY.md) causing it to reject innocent requests long after the attack has finished.
- Ambiguous instructions lead to destructive ‘Intent Drift.’ Agents can still experience intention drift even without malicious manipulation. A series of locally justified tool calls may lead to global destructive results. Basic diagnostic security requests have escalated to unauthorized firewall changes and service terminations, rendering the system unusable.
- For effective protection, you need a defense-indepth architecture which is lifecycle aware. Existing point-based defenses—such as simple input filters—are insufficient against cross-temporal, multi-stage attacks. The agent’s lifecycle must include all five layers for a robust defense. Basis for Foundational Knowledge (plugin vetting), The Perception of Input (instruction hierarchy), Cognitive State (memory integrity), Aligning Decisions Verification of the plan Execution control (kernel-level sandboxing via eBPF).
Check out Paper. Also, feel free to follow us on Twitter Don’t forget about our 120k+ ML SubReddit Subscribe now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.
Ant Research has provided this article as a support and resource.

