LLM agents are now powerful enough to perform complex tasks, from data analysis to web search and report generation. They can also handle multi-step workflows and software. But they have a hard time with the procedural memory. It is either rigid or manually created, and it’s often locked in models weights. It makes them vulnerable: unplanned events such as network failures and UI changes may force a full restart. LLM agents do not have a systematized way of building, refining, and reusing procedural skills. This is unlike humans who can learn through reusing previous experiences. The existing frameworks provide abstractions, but the optimization of memory cycles is largely left unresolved.
The ability to retain past interaction in episodic and long-term situations is a key feature of language agents. Although current systems store and retrieve data using methods such as vector embeddings and hierarchical structure, managing memory remains a difficult task. The use of procedural memory is a powerful tool for agents to automate and internalize recurring tasks. However, strategies are not well understood on how it can be constructed, updated, or reused. Agents can also learn through imitation or reinforcement, but they face problems like poor efficiency, forgetting, and poor generalization.
Memp was developed by Zhejiang University researchers and Alibaba Group. The framework is designed to provide agents with a flexible, life-long procedural memory. Memp converts previous trajectories to detailed instructions at the step level and scripts of a higher order, offering strategies for memory retrieval and updating. It continuously improves the knowledge, unlike static methods, through validation, addition, reflection and erasure, to ensure relevance and efficiency. Memp was tested on ALFWorld, TravelPlanner and consistently increased accuracy, decreased unnecessary exploration and optimised token usage. It is notable that the stronger models’ memory was effectively transferred to their weaker counterparts, improving performance. Memp is able to enable agents learn, generalize, and adapt across tasks.
Markov Decision Processes are used when an agent uses tools and actions to interact with the environment, modifying behavior over multiple steps. Every step produces states, feedback and actions. These are then combined to form trajectories, which also result in rewards depending on the success. But solving tasks in new environments results in wasted tokens as agents repeat the exploratory steps they performed earlier. The proposed framework, which is based on human procedural memories, equips agents to store, retrieve, and update procedural knowledge. Agents can then reuse their past experience, reducing the need for redundant tests and increasing efficiency.
TravelPlanner experiments and ALFWorld show that by storing routes as detailed scripts or abstract steps, you can improve accuracy and speed up exploration. Semantic similarity-based retrieval strategies further improve memory usage. Dynamic update mechanisms, such as adjustment, reflection and validation, allow agents to continuously improve skills, correct mistakes and discard old knowledge. The results show that procedural memories not only increase task completion and efficiency, but they also transfer effectively between stronger models to weaker ones. This gives smaller systems substantial performance gains. The results of retrieval scaling are improved up to a certain point. After that, excessive memory may overwhelm contexts and decrease effectiveness. The results show that procedural memory is a great way to improve agents’ learning and make them more humanlike, adaptive, efficient and effective.
Memp, in conclusion is a framework task-agnostic that uses procedural memory to optimize LLM-based agents. Memp’s systematically designed strategies for memory retrieval and updating allow agents to refine and reuse previous experiences. This improves efficiency and accuracy when performing long-horizon tasks such as TravelPlanner or ALFWorld. Memp is dynamic, updating outdated information and continuously updating it. Memp’s results demonstrate steady improvements in performance and learning. They also show benefits that can be transferred when moving memory models from strong to weaker ones. In the future, agents will be able to adapt better in scenarios with richer retrieval and self-assessment methods.
Take a look at the Technical Paper. Please feel free to browse our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter.

