How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

This tutorial will show you how to build an RAG system, which goes far beyond answering simple questions. The system is built to intelligently direct queries to the correct knowledge sources. We also perform self checks to evaluate answer quality. And we iteratively fine-tune responses to improve accuracy. We implement the entire system using open-source tools like FAISS, SentenceTransformers, and Flan-T5. We will explore the routing, retrieval and generation of a RAG-style decision tree that is based on real agentic reasoning. Visit the FULL CODES here.

print("🔧 Setting up dependencies...")
Import subprocess
Import sys
Install packages using def():
 Packages = ['sentence-transformers', 'transformers', 'torch', 'faiss-cpu', 'numpy', 'accelerate']
 For package in packages
       print(f"Installing {package}...")
       subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])
try:
   import faiss
If you get an ImportError, it's because your import is not working.
   install_packages()
   print("✓ All dependencies installed! Importing modules...n")
Import torch
Numpy can be imported as a np
from sentence_transformers import SentenceTransformer
Import pipeline for transformers
import faiss
Typing import List, Dictionary, Tuple
import warnings
warnings.filterwarnings('ignore')
print("✓ All modules loaded successfully!n")

We begin by installing all necessary dependencies, including Transformers, FAISS, and SentenceTransformers, to ensure smooth local execution. Installing essential modules like NumPy PyTorch and FAISS to embed, retrieve and generate is done after we verify the installation. Before moving on with the pipeline, we confirm that libraries have loaded successfully. Look at the FULL CODES here.

Class VectorStore
 Def __init__ (self, embedding_model="all-MiniLM-L6-v2"):
       print(f"Loading embedding model: {embedding_model}...")
       self.embedder = SentenceTransformer(embedding_model)
       self.documents = []
 self.index = none
 Def add_documents() (self, list of documents)[str], sources: List[str]):
       self.documents = [{"text": doc, "source": src} for doc, src in zip(docs, sources)]
       embeddings = self.embedder.encode(docs, show_progress_bar=False)
       dimension = embeddings.shape[1]
       self.index = faiss.IndexFlatL2(dimension)
       self.index.add(embeddings.astype('float32'))
       print(f"✓ Indexed {len(docs)} documentsn")
   def search(self, query: str, k: int = 3) -> List[Dict]:
       query_vec = self.embedder.encode([query]).astype('float32')
       distances, indices = self.index.search(query_vec, k)
 Return to the Homepage [self.documents[i] for i in indices[0]]

The VectorStore class is designed to efficiently store and retrieve documents using FAISS-based search. The documents are embedded into a model using transformers and then an index built to allow for rapid retrieval. It allows us to retrieve relevant information for any given query. Take a look at the FULL CODES here.

Class QueryRouter
   def __init__(self):
       self.categories = {
           'technical': ['how', 'implement', 'code', 'function', 'algorithm', 'debug'],
           'factual': ['what', 'who', 'when', 'where', 'define', 'explain'],
           'comparative': ['compare', 'difference', 'versus', 'vs', 'better', 'which'],
           'procedural': ['steps', 'process', 'guide', 'tutorial', 'how to']
       }
   def route(self, query: str) -> str:
       query_lower = query.lower()
       scores = {}
       for category, keywords in self.categories.items():
           score = sum(1 for kw in keywords if kw in query_lower)
 ScoresAgenttic AI = score
       best_category = max(scores, key=scores.get)
 If scores, return the best_category[best_category] > 0 else 'factual'

The QueryRouter is a class that allows us to categorize queries based on their intent: technical, factual comparative or procedural. By using keyword matching, we determine the category that best matches your input. This step allows the retrieval to adapt dynamically in response to query types. See the FULL CODES here.

CLASS ANSWERGENDER:
   def __init__(self, model_name="google/flan-t5-base"):
       print(f"Loading generation model: {model_name}...")
       self.generator = pipeline('text2text-generation', model=model_name, device=0 if torch.cuda.is_available() Otherwise -1 (max_length=256).
       device_type = "GPU" if torch.cuda.is_available() You can also find out more about "CPU"
       print(f"✓ Generator ready (using {device_type})n")
   def generate(self, query: str, context: List[Dict], query_type: str) -> str:
       context_text = "nn".join([f"[{doc['source']}]: {doc['text']}" "Doc in context"
      
Context:
{context_text}


Question: {query}


Answer:"""
       answer = self.generator(prompt, max_length=200, do_sample=False)[0]['generated_text']
 Return answer.strip()
   def self_check(self, query: str, answer: str, context: List[Dict]) -> Tuple[bool, str]:
 If len (answer),

AnswerGenerator was built to create answers and evaluate them. We generate text answers based on retrieved documents using the Flan T5 model. We then perform a check to ensure that our answer is accurate and meaningful. Click here to see the FULL CODES here.

AgenticRAG class:
   def __init__(self):
       self.vector_store = VectorStore()
 Self.router = queryRouter()
 Self-generator = AnswerGenerator()
       self.max_iterations = 2
 Def add_knowledge() (self, list of documents)[str], sources: List[str]):
       self.vector_store.add_documents(documents, sources)
   def query(self, question: str, verbose: bool = True) -> Dict:
       if verbose:
           print(f"n{'='*60}")
           print(f"🤔 Query: {question}")
           print(f"{'='*60}")
       query_type = self.router.route(question)
       if verbose:
           print(f"📍 Route: {query_type.upper()} query detected")
       k_docs = {'technical': 2, 'comparative': 4, 'procedural': 3}.get(query_type, 3)
 Iteration = 0
 answer_accepted=False
 Iteration

All components are combined into AgenticRAG, a system that orchestrates routing and retrieval as well as generation, quality assurance, and checking. Iteratively, the system refines answers using feedback from its own evaluation. This includes adjusting queries or adding context as needed. The RAG is a decision tree that uses feedback to improve performance. See the FULL CODES here.

Def main():
   print("n" + "="*60)
   print("🚀 AGENTIC RAG WITH ROUTING & SELF-CHECK")
   print("="*60 + "n")
 Documents [
       "RAG (Retrieval-Augmented Generation) combines information retrieval with text generation. It retrieves relevant documents and uses them as context for generating accurate answers."
   ]
   sources = ["Python Documentation", "ML Textbook", "Neural Networks Guide", "Deep Learning Paper", "Transformer Architecture", "RAG Research Paper"]
   rag = AgenticRAG()
   rag.add_knowledge(documents, sources)
   test_queries = ["What is Python?", "How does machine learning work?", "Compare neural networks and deep learning"]
 Test_queries for query:
       result = rag.query(query, verbose=True)
       print(f"n{'='*60}")
       print(f"📊 FINAL RESULT:")
       print(f"   Answer: {result['answer']}")
       print(f"   Query Type: {result['query_type']}")
       print(f"   Iterations: {result['iterations']}")
       print(f"   Accepted: {result['accepted']}")
       print(f"{'='*60}n")
If the __name__ equals "__main__":
 The main reason for this is that()

Finalize the demonstration by running test queries and loading a knowledge base through the Agentic RAG Pipeline. We watch as the model refines and routes answers in a step-by-step manner, printing out intermediate results so that we can be transparent. We confirm at the end that our system delivers self-validated, accurate answers by using local computation.

We conclude by creating a fully-functional Agentic RAG Framework that retrieves and reasons its own answers. The system is able to dynamically route different types of queries, assess its responses and refine them by iteratively providing feedback. All this happens within a lightweight local environment. This exercise helps us to better understand RAG architectures, and we also get a feel for how agents can turn static retrieval systems intelligent self-improvers.

Click here to find out more FULL CODES here. Check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost was his most recent venture. This platform, which focuses on machine learning and deep-learning news, is both technical and understandable to a broad audience. Over 2 million views per month are a testament to the platform’s popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

Cursor Releases TypeScript-based SDKs for Building Coding Agents with Sandboxed Cloud Virtual Machines, Subagents Hooks and Token Based Pricing

The smol Audio Notebook: An Adaptive Collection of Notebooks for Whisper, Parakeet Voxtral Granite Speech and Audio Flamingo 3.

Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs

The Top 10 Compression techniques for LLM inference using KV cache: Reduced memory overhead across evictions, low-rank methods, and quantization

The Executive Team and All Employees are AI Agents

OpenAI’s open-weight models are coming to US Military

OpenAI Sora App lets you fake yourself to entertain.

The director of a raunchy, 3-hour Dracula film says that AI is slimy and gross. The Director of a Raunchy 3-Hour Dracula Movie Says AI Is Gross and Slimy

The Leaked Memo from Anthropic’s CEO: the company will pursue Gulf State investments after all

Top Insights

Google is not banning ads in Gemini

Crypto-Funded Human Trafficking Is Exploding

Latest News

Reid Hoffman Thinks Doctors Should Ask AI for a Second Opinion

Cursor Releases TypeScript-based SDKs for Building Coding Agents with Sandboxed Cloud Virtual Machines, Subagents Hooks and Token Based Pricing

How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

Related Posts