A Coding implementation to build Neural Memory agents with differentiable memory, meta-learning, and experience replay for continuous adaptation in dynamic environments

This tutorial explores how agents with neural memories can continuously learn without losing past experience. In this tutorial, we explore how a neural memory agent can learn continuously without forgetting past experiences. We demonstrate, using PyTorch to implement this method, how the content-based memory address and priority replay allow for the model’s performance to be maintained across different learning tasks and overcome catastrophic forgetting. Visit the FULL CODES here.

Import torch
Import torch.nn as Nn
Import Torch.nn. Functional as F
Import numpy as an np
From collections Import Deque
Matplotlib.pyplot can be imported as a plt
Import dataclasses from dataclass


@dataclass
Memory Config class:
 Memory size: 128
   memory_dim: int = 64
 Num_read_heads : int = 4.
 Num_write_heads : int = 1.

We start by importing the libraries that are essential and define a configuration class to represent our neural memory. We set the parameters for the memory size, the dimensionality and the read/write head count that will determine how the memory is trained. It is this setup that forms the base of our architecture for memory-augmented learning. Take a look at the FULL CODES here.

class NeuralMemoryBank(nn.Module):
   def __init__(self, config: MemoryConfig):
 Supermarkets are a great way to buy goods and services.().__init__()
       self.memory_size = config.memory_size
       self.memory_dim = config.memory_dim
       self.num_read_heads = config.num_read_heads
       self.register_buffer('memory', torch.zeros(config.memory_size, config.memory_dim))
       self.register_buffer('usage', torch.zeros(config.memory_size))
   def content_addressing(self, key, beta):
       key_norm = F.normalize(key, dim=-1)
       mem_norm = F.normalize(self.memory, dim=-1)
       similarity = torch.matmul(key_norm, mem_norm.t())
       return F.softmax(beta * similarity, dim=-1)
   def write(self, write_key, write_vector, erase_vector, write_strength):
       write_weights = self.content_addressing(write_key, write_strength)
       erase = torch.outer(write_weights.squeeze(), erase_vector.squeeze())
       self.memory = (self.memory * (1 - erase)).detach()
       add = torch.outer(write_weights.squeeze(), write_vector.squeeze())
       self.memory = (self.memory + add).detach()
       self.usage = (0.99 * self.usage + write_weights.squeeze()).detach()
   def read(self, read_keys, read_strengths):
 The word reads is a synonym for the term. []
       for i in range(self.num_read_heads):
           weights = self.content_addressing(read_keys[i], read_strengths[i])
           read_vector = torch.matmul(weights, self.memory)
           reads.append(read_vector)
       return torch.cat(reads, dim=-1)


class MemoryController(nn.Module):
   def __init__(self, input_dim, hidden_dim, memory_config: MemoryConfig):
 Supermarkets are a great way to buy goods and services.().__init__()
       self.hidden_dim = hidden_dim
       self.memory_config = memory_config
       self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
       total_read_dim = memory_config.num_read_heads * memory_config.memory_dim
       self.read_keys = nn.Linear(hidden_dim, memory_config.num_read_heads * memory_config.memory_dim)
       self.read_strengths = nn.Linear(hidden_dim, memory_config.num_read_heads)
       self.write_key = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.erase_vector = nn.Linear(hidden_dim, memory_config.memory_dim)
       self.write_strength = nn.Linear(hidden_dim, 1)
       self.output = nn.Linear(hidden_dim + total_read_dim, input_dim)
   def forward(self, x, memory_bank, hidden=None):
       lstm_out, hidden = self.lstm(x.unsqueeze(0), hidden)
       controller_state = lstm_out.squeeze(0)
       read_k = self.read_keys(controller_state).view(self.memory_config.num_read_heads, -1)
       read_s = F.softplus(self.read_strengths(controller_state))
       write_k = self.write_key(controller_state)
       write_v = torch.tanh(self.write_vector(controller_state))
       erase_v = torch.sigmoid(self.erase_vector(controller_state))
       write_s = F.softplus(self.write_strength(controller_state))
       read_vectors = memory_bank.read(read_k, read_s)
       memory_bank.write(write_k, write_v, erase_v, write_s)
 Combination = Torch.cat[controller_state, read_vectors], dim=-1)
       output = self.output(combined)
 Return output is hidden

Memory Controllers and Neural Memory banks are implemented, and together they form the basis of the Agent’s memory. This memory is dynamically accessed by the controller using read-write operations. The Neural Bank uses content-based address to store and retrieve information. It allows the agent to adapt and remember relevant inputs. See the FULL CODES here.

Class ExperienceReplay
   def __init__(self, capacity=10000, alpha=0.6):
 Self-capacity = capability
       self.alpha = alpha
       self.buffer = deque(maxlen=capacity)
       self.priorities = deque(maxlen=capacity)
   def push(self, experience, priority=1.0):
       self.buffer.append(experience)
       self.priorities.append(priority ** self.alpha)
   def sample(self, batch_size, beta=0.4):
       if len(self.buffer) == 0:
 Return to the Homepage [], []
       probs = np.array(self.priorities)
 The term probs is used to refer to the probs.sum or probs.()
       indices = np.random.choice(len(self.buffer), min(batch_size, len(self.buffer)), p=probs, replace=False)
       samples = [self.buffer[i] for i in indices]
       weights = (len(self.buffer) * probs[indices]) ** (-beta)
 Weights = Weights.max()
       return samples, torch.FloatTensor(weights)


class MetaLearner(nn.Module):
   def __init__(self, model):
 Supermarkets are a great way to buy goods and services.().__init__()
       self.model = model
 Define adapt (self, support_x and support_y; num_steps=5, Lr=0.01)
       adapted_params = {name: param.clone() for name, param in self.model.named_parameters()}
 For _ within range (num_steps),:
           pred, _ = self.model(support_x, self.model.memory_bank)
 Loss = F.mse_loss (pred, Support_y)
           grads = torch.autograd.grad(loss, self.model.parameters(), create_graph=True)
           adapted_params = {name: param - lr * grad for (name, param), grad in zip(adapted_params.items(), grads)}
 Return adapted_params

The Experience Replay component and Meta-Learner are designed to improve the agent’s learning ability. In order to reduce forgetting and improve the replay buffer, the model can revisit previous experiences by prioritizing sampling. Meta-Learner, on the other hand, uses MAML-style adaptive learning for rapid acquisition of new tasks. These modules work together to provide stability and flexibility in the training of agents. Click here to view the FULL CODES here.

class ContinualLearningAgent:
   def __init__(self, input_dim=64, hidden_dim=128):
       self.config = MemoryConfig()
       self.memory_bank = NeuralMemoryBank(self.config)
       self.controller = MemoryController(input_dim, hidden_dim, self.config)
       self.replay_buffer = ExperienceReplay(capacity=5000)
       self.meta_learner = MetaLearner(self.controller)
       self.optimizer = torch.optim.Adam(self.controller.parameters(), lr=0.001)
       self.task_history = []
 Train_step def (self, x y, with_replay = True)
       self.optimizer.zero_grad()
       pred, _ = self.controller(x, self.memory_bank)
       current_loss = F.mse_loss(pred, y)
       self.replay_buffer.push((x.detach().clone(), y.detach().clone()), priority=current_loss.item() + 1e-6)
       total_loss = current_loss
       if use_replay and len(self.replay_buffer.buffer) > 16:
           samples, weights = self.replay_buffer.sample(8)
           for (replay_x, replay_y), weight in zip(samples, weights):
 With torch.enable_grad():
                   replay_pred, _ = self.controller(replay_x, self.memory_bank)
                   replay_loss = F.mse_loss(replay_pred, replay_y)
                   total_loss = total_loss + 0.3 * replay_loss * weight
       total_loss.backward()
       torch.nn.utils.clip_grad_norm_(self.controller.parameters(), 1.0)
       self.optimizer.step()
       return total_loss.item()
   def evaluate(self, test_data):
       self.controller.eval()
       total_error = 0
 No_grad. With torch():
 Test_data for x and y:
               pred, _ = self.controller(x, self.memory_bank)
               total_error += F.mse_loss(pred, y).item()
       self.controller.train()
       return total_error / len(test_data)

A Continual-Learning Agent is constructed that incorporates memory, controllers, replay, and Meta-Learning into an adaptive, single framework. This step defines how each agent is trained, replays previous data and measures its performance. This ensures the model retains prior information while learning without forgetting. See the FULL CODES here.

def create_task_data(task_id, num_samples=100):
   torch.manual_seed(task_id)
   x = torch.randn(num_samples, 64)
 if task_id >= 0,
       y = torch.sin(x.mean(dim=1, keepdim=True).expand(-1, 64))
 Task_id >= 1
       y = torch.cos(x.mean(dim=1, keepdim=True).expand(-1, 64)) * 0.5
   else:
       y = torch.tanh(x * 0.5 + task_id)
 You can return to your original language by clicking here. [(x[i]The y[i]) for i in range(num_samples)]


def run_continual_learning_demo():
   print("🧠 Neural Memory Agent - Continual Learning Demon")
   print("=" * 60)
   agent = ContinualLearningAgent()
   num_tasks = 4
   results = {result = tasks' [], 'without_memory': [], 'with_memory': []}
 If task_id is in the range (num_tasks), then:
       print(f"n📚 Learning Task {task_id + 1}/{num_tasks}")
       train_data = create_task_data(task_id, num_samples=50)
       test_data = create_task_data(task_id, num_samples=20)
 For epoch within range(20)
           total_loss = 0
 Train_data for x and y:
               loss = agent.train_step(x, y, use_replay=(task_id > 0))
               total_loss += loss
           if epoch % 5 == 0:
               avg_loss = total_loss / len(train_data)
               print(f"  Epoch {epoch:2d}: Loss = {avg_loss:.4f}")
       print(f"n  📊 Evaluation on all tasks:")
 Range(task_id+1) for eval_task_id:
           eval_data = create_task_data(eval_task_id, num_samples=20)
           error = agent.evaluate(eval_data)
           print(f"    Task {eval_task_id + 1}: Error = {error:.4f}")
 If eval_task_id >= task_id
 The following are results of the search:['tasks'].append(eval_task_id + 1)
 The following are results of the search:['with_memory'].append(error)
 Figure, Axes= plt.subplots(1), (2), figsize= (14, 5).
 ax = Axes[0]
   memory_matrix = agent.memory_bank.memory.detach().numpy()
   im = ax.imshow(memory_matrix, aspect="auto", cmap='viridis')
 ax.set_title() ('Neural Memory State', size=14, weight="bold")
   ax.set_xlabel('Memory Dimension')
   ax.set_ylabel('Memory Slots')
   plt.colorbar(im, ax=ax)
 ax = Axes[1]
   ax.plot(results['tasks']The results['with_memory'], marker="o", linewidth=2, markersize=8, label="With Memory Replay")
   ax.set_title('Continual Learning Performance', fontsize=14, fontweight="bold")
   ax.set_xlabel('Task Number')
   ax.set_ylabel('Test Error')
   ax.legend()
   ax.grid(True, alpha=0.3)
   plt.tight_layout()
   plt.savefig('neural_memory_results.png', dpi=150, bbox_inches="tight")
   print("n✅ Results saved to 'neural_memory_results.png'")
   plt.show()
   print("n" + "=" * 60)
   print("🎯 Key Insights:")
   print("  • Memory bank stores compressed task representations")
   print("  • Experience replay mitigates catastrophic forgetting")
   print("  • Agent maintains performance on earlier tasks")
   print("  • Content-based addressing enables efficient retrieval")


If the __name__ equals "__main__":
   run_continual_learning_demo()

In order to demonstrate the continuous learning process we generate synthetic tasks in multiple environments. We observe, as we visualize and train the results of our experiment, how memory replay enhances accuracy and stability across all tasks. The final graphical insight highlights how the differentiable memory improves an agent’s learning ability over time.

We have built and trained an agent that can adapt to changing tasks. The differentiable memory enabled efficient retrieval and storage of the learned representations while the replay mechanism reinforced stability and knowledge. Combining these agents with meta-learning allowed us to see how they can lead to more self-adaptive, resilient neural systems. They are able remember, reason and evolve, without forgetting what they have already learned.

Click here to find out more FULL CODES here. Check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Don’t forget about our 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Asif Razzaq, CEO of Marktechpost Media Inc. is a visionary engineer and entrepreneur who is dedicated to harnessing Artificial Intelligence’s potential for the social good. Marktechpost was his most recent venture. This platform, which focuses on machine learning and deep-learning news, is both technical and understandable to a broad audience. This platform has over 2,000,000 monthly views which shows its popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

A Coding implementation to build Neural Memory agents with differentiable memory, meta-learning, and experience replay for continuous adaptation in dynamic environments

GitNexus, an Open-Source Knowledge Graph Engine that is MCP Native and Gives Claude Coding and Cursor Complete Codebase Structure Awareness

Deepgram Python SDK Implementation for Transcription and Async Processing of Audio, Async Text Intelligence, and Async Text Intelligence.

DeepSeek AI releases DeepSeek V4: Sparse attention and heavily compressed attention enable one-million-token contexts.

OpenMythos Coding Tutorial: Recurrent-Depth Transformers, Depth Extrapolation and Mixture of Experts Routing

What You Need to Know for a Tech Bubble

Europe Bends the Knee To The US On Tech Policy

AI will never be conscious

Vibe Coding is Shoot-and-Overlook Coding

Google’s AI Overviews can scam you. Keep yourself safe

Top Insights

How can you cut your AI training costs by 80 percent? Oxford’s new optimizer delivers 7.5x faster training by optimizing how a model learns

Google Colab now integrates KaggleHub, allowing users to access Kaggle datasets, competitions and models with just a click.

Latest News

Anthropic Mythos is Unauthorized by Discord Sleuths

Ace the Ping Pong Robot can Whup your Ass

A Coding implementation to build Neural Memory agents with differentiable memory, meta-learning, and experience replay for continuous adaptation in dynamic environments

Related Posts