The Agentic Memory Research that Unites Long-Term Memory and Short-Term Memory in LLM Agents

How can you create a LLM agent which decides? For yourself How can we decide what goes into long-term storage, what stays in context for the moment and what gets thrown out without using heuristics that are hand-tuned or additional controllers? Is it possible to learn a policy that manages both types of memory through the same space for text generation?

Researchers from Alibaba Group Wuhan University Introduce AgeMem, also known as Agentic Memory or AgeMemA framework allows large language models to learn how they can manage long-term and short-term memory within a policy. Instead of depending on external controllers or hand-written rules, the agent can decide when to retrieve, store, summarize, and then forget using tools built into the action area of the model.

What current LLM agents are struggling with?

Memory is usually treated as two separate systems in most agent frameworks..

The long-term memory is the window that displays current context, which contains active dialogue and retrieved documents. The short-term memory is the context window that contains active dialogues and documents.

The existing systems are designed to separate these two components. External stores, such as vector databases with easy add-and-retrieve triggers, are used to manage long term memory. With retrieval enhanced generation, sliding Windows or summarization Schedules short-term memory can managed.

The separation of the two families creates a number of issues.

The short-term and long-term memory can be optimized separately. The interaction between them is not taught in its entirety.
Heuristics help determine the best time to memorize and summarise. They are fragile and can miss important but rare events.
The cost of adding controllers and expert models increases.

AgeMem eliminates the external memory controller, and integrates it into the policy of the agent.

Memory and the Agent Action Space

AgeMem exposes memory operations as tools. The model may emit normal text or tool calls at each step. The framework has six tools.

Memory retention is important:

ADD Stores a new memory object with metadata and content.
UPDATE Modifies an existing memory item.
DELETE Get rid of items with low or obsolete value.

Short term memory:

Retire Performs a semantic search of the long-term memory, and then injects the items retrieved into the context.
SUMMARY The dialogue is compressed into a shorter summary.
FILTER Remove context segments which are no longer useful to future reasoning.

It is structured. Every step begins with a The model can only reason privately if it is blocked. Next, the model emits either a A block can be created with either a JSON listing of the tools that are invoked, or a Block with the response that is visible to the user. Memory actions, therefore, are first-class decisions and not side effects.

The three-stage reinforcement of memory

AgeMem uses reinforcement learning to combine long-term and short-term memory behaviors.

State of the time T Includes the conversational context of the moment, long-term memory and task specifications. A token call or an application is selected as the policy’s action. Each sample has a training path that is broken down into three stages.:

Long-term memory stage 1Agents interact in casual settings and gather information which will be useful later. It makes use of ADD, UPDATE You can also find out more about the following: DELETE Build and maintain long-term memory. This is the stage when short-term memory begins to develop.
Stage 2, short term memory control under distractorsThis resets the context for short-term events. The long-term memory remains. Now, the agent receives content related to but not essential. The agent must use short-term memory. SUMMARY You can also find out more about the following: FILTER Remove noise and keep only useful information.
The third stage of integrated reasoningThe agent retrieves from long term memory using the method of retrieval. The agent uses long-term memory to retrieve the information. RetireIt controls the context in the short-term and provides the answer.

This is a crucial point: Long-term memories persist across stages, whereas short-term memories are cleared from Stages 1 to 2. It is because of this design that the model relies on retrieval and not residual context. This exposes long-term dependencies.

Reward Design and Step-Wise GRPO

AgeMem employs a variant of Group Relative Policy Optimization, or GRPO. The system sample multiple trajectory groups for each task. The system computes a terminal reward for each trajectory and then normalizes it within the group in order to get an advantage signal. This signal is then broadcasted to the entire trajectory, so that all intermediate tool choices can be trained using the final outcome.

There are three major components to the reward.:

Task reward that uses an LLM to score answer quality from 0 to 1.
The context reward is a way to measure the effectiveness of memory short term operations. It includes compressing, summarizing early and preserving query-relevant content.
The memory rewards measure the long term memory’s quality. This includes the number of stored items of high quality, as well as their usefulness and relevance to queries.

These three elements are given equal weighting so they all contribute equally to the signal of learning. When the dialogue exceeds its maximum length, or the context is too large, a penalty term will be added.

https://arxiv.org/pdf/2601.01885

The experimental set-up and its main results

AgeMem is evaluated on five benchmarks by the research team.

ALFWorld text based tasks.
SciWorld is a theme park for environments with a science-themed.
BabyAI instruction is below.
PDDL for planning.
HotpotQA is a multi-hop question answering system.

HotpotQA LLM Judge Score and Success Rate for ALFWorld are among the metrics. The Memory Quality Metric is also defined using an LLM Evaluator, which compares the stored memories with HotpotQA’s supporting facts.

Baselines are LangMem (A Mem), Mem0 and Mem0g, as well as a memory-free agent. The backbones of Qwen2.5-7B Instruct and Qwen3-4B Instruct are the Qwen2.5-7B -Instruct.

AgeMem scores an average 41.96 points across all 5 benchmarks on Qwen2.5-7B. Mem0 is the top baseline with a score of 37.14. AgeMem achieves 54.31 in Qwen3-4B – Instruct. The best baseline is A Mem, which reaches 45.74.

Memory also gets better. AgeMem on HotpotQA reaches 0.53 with Qwen2.5-7B or 0.605 Qwen3-4B. This is better than any baseline.

Short-term memory tools can reduce the length of prompts while maintaining performance. Configurations with STM use 3 to 5% fewer tokens for each prompt on HotpotQA than those that substitute STM with retrieval pipelines.

Each component is important, according to ablation studies. Addition of only the long-term memory tool on top a baseline with no memory yields already clear benefits. The scores are improved further by adding reinforcement learning to these tools. With both short-term and long-term tools, plus RL, the full system can improve scores by up to 21,7 percentage points over baseline without memory on SciWorld.

Implications for LLM agent design

AgeMem proposes a pattern of design for future agentic system. Memory is best handled within the policy learned, and not in two separate subsystems. The agent can learn to effectively manage context by combining language generation with storage, retrieval and summarization.

The Key Takeaways

AgeMem makes memory operations explicit, so the policy used to generate text is also used for deciding when to use them. ADD, UPDATE, DELETE, Retire, SUMMARY You can also find out more about the following: FILTER memory.
The long-term and short-term memory is trained together through a three-stage RL set up where the long term memory remains across all stages, and the short term context resets to force retrieval reasoning.
With uniform weights and penalties for excessive dialog length and context overload, the reward function balances task accuracy and context management with long-term memory.
AgeMem consistently performs better than memory baselines like LangMem (A Mem), Mem0 and LangMem in ALFWorld and SciWorld tasks, BabyAI and PDDL, and HotpotQA.
The use of short term memory can reduce the length of prompts by 3 to 5 per cent compared with RAG-style baselines, while maintaining or improving performance. This shows that context filtering and summarization rules are not necessary.

Take a look at the FULL PAPER here. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! What? now you can join us on telegram as well.

Our latest releases of ai2025.devIt is an analytics platform focused on 2025 that converts model launches and benchmarks into a structured data set you can export, compare and filter.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost is his latest venture, a media platform that focuses on Artificial Intelligence. It is known for providing in-depth news coverage about machine learning, deep learning, and other topics. The content is technically accurate and easy to understand by an audience of all backgrounds. Over 2 million views per month are a testament to the platform’s popularity.

The Agentic Memory Research that Unites Long-Term Memory and Short-Term Memory in LLM Agents

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

Google DeepMind Hires Former CTO of Boston Dynamics because the Firm Pushes Deeper Into Robotics

AI can now analyze language as well as an expert human.

‘Wicked’ Director Jon M. Chu on ‘What Makes Art Beautiful’ in the AI Era

Anthropic denies that it can sabotage AI during war

Free Local RAG Scraper for Custom GPTs and Assistants • AI Blog

Top Insights

AI: The Next Frontier A Consciousness Algorithm

Distillation can make AI models smaller and cheaper

Latest News

Apple’s new CEO must launch an AI killer product

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

The Agentic Memory Research that Unites Long-Term Memory and Short-Term Memory in LLM Agents

What current LLM agents are struggling with?

Memory and the Agent Action Space

The three-stage reinforcement of memory

Reward Design and Step-Wise GRPO

The experimental set-up and its main results

Implications for LLM agent design

The Key Takeaways

Related Posts