Close Menu
  • AI
  • Content Creation
  • Tech
  • Robotics
AI-trends.todayAI-trends.today
  • AI
  • Content Creation
  • Tech
  • Robotics
Trending
  • Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks
  • The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs
  • Schematik Is ‘Cursor for Hardware.’ The Anthropics Want In
  • Hacking the EU’s new age-verification app takes only 2 minutes
  • Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale
  • This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.
  • The Huey Code Guide: Build a High-Performance Background Task Processor Using Scheduling with Retries and Pipelines.
  • Top 19 AI Red Teaming Tools (2026): Secure Your ML Models
AI-trends.todayAI-trends.today
Home»Tech»Learn how to build an advanced AI agent with vector-based long-term memory and summarized short-term memory.

Learn how to build an advanced AI agent with vector-based long-term memory and summarized short-term memory.

Tech By Gavin Wallace03/09/20255 Mins Read
Facebook Twitter LinkedIn Email
Microsoft Releases NLWeb: An Open Project that Allows Developers to
Microsoft Releases NLWeb: An Open Project that Allows Developers to
Share
Facebook Twitter LinkedIn Email

We will walk you through the process of building an AI Agent which not only talks but can also recall information. From scratch, we demonstrate how to combine an LLM lightweight, FAISS Vector Search, and a Summarization Mechanism to create both long-term and short-term memories. Working together with auto-distilled facts and embedded information, we are able to create an intelligent agent who can adapt to our commands, retain important details during future conversations and intelligently compress context. See the FULL CODES here.

!pip -q install transformers accelerate bitsandbytes sentence-transformers faiss-cpu


import os, json, time, uuid, math, re
Datetime can be imported from another datetime
import torch, faiss
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from sentence_transformers import SentenceTransformer
DEVICE = "cuda" if torch.cuda.is_available() You can also find out more about "cpu"

Installation of the libraries is followed by the importation of all required modules. Set up your environment so that you can determine if the model will run efficiently on a GPU, or CPU. Click here to view the FULL CODES here.

Def load_llm (model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
   try:
 If DEVICE=="cuda":
           bnb=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_quant_type="nf4")
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb, device_map="auto")
       else:
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, low_cpu_mem_usage=True)
 The return pipe ("text-generation", model=mdl, tokenizer=tok, device=0 if DEVICE=="cuda" Otherwise -1 (do_sample=True).
 Except Exception As e.
       raise RuntimeError(f"Failed to load LLM: {e}")

The function we define loads our language model. In order to optimize performance, if there is a GPU available we set up the system so we can use 4-bit quantumization. It ensures that we can produce text efficiently regardless of which hardware we’re running on. See the FULL CODES here.

VectorMemory class:
   def __init__(self, path="/content/agent_memory.json", dim=384):
       self.path=path; self.dim=dim; self.items=[]
       self.embedder=SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device=DEVICE)
       self.index=faiss.IndexFlatIP(dim)
       if os.path.exists(path):
           data=json.load(open(path))
           self.items=data.get("items",[])
 If self.items are:
               X=torch.tensor([x["emb"] for x in self.items], dtype=torch.float32).numpy()
               self.index.add(X)
   def _emb(self, text):
       v=self.embedder.encode([text], normalize_embeddings=True)[0]
 Return to list()
 Def add(self.text, meta=None).
       e=self._emb(text); self.index.add(torch.tensor([e]).numpy())
       rec={"id":str(uuid.uuid4()),"text":text,"meta":meta or {}, "emb":e}
       self.items.append(rec); self._save()Return Rec["id"]
 def search (self, question, k=5, threshold=0.25)
 If len (self.items == 0): Return []
       q=self.embedder.encode([query], normalize_embeddings=True)
       D,I=self.index.search(q, min(k, len(self.items)))
       out=[]
       for d,i in zip(D[0],I[0]):
 If i==-1, continue
           if d>=thresh: out.append((d,self.items[i]))
 Return to the page
   def _save(self):
       slim=[{k:v for k,v in it.items()} for it in self.items]
       json.dump({"items":slim}, open(self.path,"w"), indent=2)

Create a VectorMemory Class to give our agent long-term memories. MiniLM allows us to embed past interactions and then index and search them using FAISS. This will allow you to find and retrieve information at a later date. Every memory is stored on disk to allow the agent to maintain its memory between sessions. Click here to see the FULL CODES here.

Def Now_iso(): return datetime.now().isoformat(timespec="seconds")
def clamp(txt, n=1600): return txt if len(txt)self.max_turns:
           convo="n".join([f"{r}: {t}" for r,t in self.turns])
           s=self._gen(SUMMARIZE_PROMPT(clamp(convo, 3500)), max_new_tokens=180, temp=0.2)
           self.summary=s; self.turns=self.turns[-4:]
   def recall(self, query, k=5):
       hits=self.mem.search(query, k=k)
 You can return to your original language by clicking here. "n".join([f"- ({d:.2f}) {h['text']} [meta={h['meta']}]" for d,h in hits])
 Ask yourself (e.g., "self" or "user"):
       self.turns.append(("user", user))
       saved, memline = self._distill_and_store(user)
       mem_ctx=self.recall(user, k=6)
       prompt=self._chat_prompt(user, mem_ctx)
       reply=self._gen(prompt)
       self.turns.append(("assistant", reply))
       self._maybe_summarize()
       status=f"💾 memory_saved: {saved}; " + (f"note: {memline}" If saved, else "note: -")
       print(f"nUSER: {user}nASSISTANT: {reply}n{status}")
 Return reply

MemoryAgent brings it all together. This agent will generate context sensitive responses, store important information in long-term memories, and summarize the conversation to keep it short-term. This setup allows us to create an assistant who remembers and recalls our conversations with it, as well as adapts. Visit the FULL CODES here.

agent=MemoryAgent()


print("✅ Agent ready. Try these:n")
agent.ask("Hi! My name is Nicolaus, I prefer being called Nik. I'm preparing for UPSC in 2027.")
agent.ask("Also, I work at  Visa in analytics and love concise answers.")
agent.ask("What's my exam year and how should you address me next time?")
agent.ask("Reminder: I like agentic RAG tutorials with single-file Colab code.")
agent.ask("Given my prefs, suggest a study focus for this week in one paragraph.")

Instantiate MemoryAgent, and send a couple of messages immediately to test its recall and help it form long-term associations. The MemoryAgent adapts to the conciseness of our replies, remembers what we prefer to call it and which exam year. We also confirm that past preferences are remembered (agenttic RAG or single-file Colab).

As a conclusion, it’s amazing how much power we can get from giving our AI Agent the capability to remember. Our agent now stores important information, retrieves it when necessary, and summarises conversations in order to remain efficient. This keeps the interaction contextual, and evolves with every exchange. This foundation allows us to expand memory, experiment with richer schemas, or create more advanced designs for memory-augmented agents.


Take a look at the FULL CODES here. Check out our website to learn more. GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter.


Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost was his most recent venture. This platform, which focuses on machine learning and deep-learning news, is both technical and understandable to a broad audience. This platform has over 2,000,000 monthly views which shows its popularity.

AI van
Share. Facebook Twitter LinkedIn Email
Avatar
Gavin Wallace

Related Posts

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026

Google AI Releases Google Auto-Diagnosis: A Large Language Model LLM Based System to Diagnose Integrity Test Failures At Scale

18/04/2026

This is a complete guide to running OpenAI’s GPT-OSS open-weight models using advanced inference workflows.

18/04/2026
Top News

OpenAI announces major expansion of London office

Huxe Gives You A Daily Audio Summary Personalized Using AI

OpenAI Government Partnership Unpacked: WIRED roundup

Deepfake ‘Nudify’ Technology Is Getting Darker—and More Dangerous

‘Physical AI’ Is Coming for Your Car

Load More
AI-Trends.Today

Your daily source of AI news and trends. Stay up to date with everything AI and automation!

X (Twitter) Instagram
Top Insights

All of the Updates You Have to Know

06/01/2026

Anthropic releases Claude Opus with 1M context, agentic coding, adaptive reasoning controls, and expanded safety tooling capabilities

05/02/2026
Latest News

Anthropic releases Claude Opus 4.7, a major upgrade for agentic coding, high-resolution vision, and long-horizon autonomous tasks

19/04/2026

The Coding Guide to Property Based Testing with Hypothesis and Stateful, Differential and Metamorphic Test Designs

19/04/2026
X (Twitter) Instagram
  • Privacy Policy
  • Contact Us
  • Terms and Conditions
© 2026 AI-Trends.Today

Type above and press Enter to search. Press Esc to cancel.