How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

This tutorial will show you how to build an RAG system, which goes far beyond answering simple questions. The system is built to intelligently direct queries to the correct knowledge sources. We also perform self checks to evaluate answer quality. And we iteratively fine-tune responses to improve accuracy. We implement the entire system using open-source tools like FAISS, SentenceTransformers, and Flan-T5. We will explore the routing, retrieval and generation of a RAG-style decision tree that is based on real agentic reasoning. Visit the FULL CODES here.

print("🔧 Setting up dependencies...")
Import subprocess
Import sys
Install packages using def():
 Packages = ['sentence-transformers', 'transformers', 'torch', 'faiss-cpu', 'numpy', 'accelerate']
 For package in packages
       print(f"Installing {package}...")
       subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', package])
try:
   import faiss
If you get an ImportError, it's because your import is not working.
   install_packages()
   print("✓ All dependencies installed! Importing modules...n")
Import torch
Numpy can be imported as a np
from sentence_transformers import SentenceTransformer
Import pipeline for transformers
import faiss
Typing import List, Dictionary, Tuple
import warnings
warnings.filterwarnings('ignore')
print("✓ All modules loaded successfully!n")

We begin by installing all necessary dependencies, including Transformers, FAISS, and SentenceTransformers, to ensure smooth local execution. Installing essential modules like NumPy PyTorch and FAISS to embed, retrieve and generate is done after we verify the installation. Before moving on with the pipeline, we confirm that libraries have loaded successfully. Look at the FULL CODES here.

Class VectorStore
 Def __init__ (self, embedding_model="all-MiniLM-L6-v2"):
       print(f"Loading embedding model: {embedding_model}...")
       self.embedder = SentenceTransformer(embedding_model)
       self.documents = []
 self.index = none
 Def add_documents() (self, list of documents)[str], sources: List[str]):
       self.documents = [{"text": doc, "source": src} for doc, src in zip(docs, sources)]
       embeddings = self.embedder.encode(docs, show_progress_bar=False)
       dimension = embeddings.shape[1]
       self.index = faiss.IndexFlatL2(dimension)
       self.index.add(embeddings.astype('float32'))
       print(f"✓ Indexed {len(docs)} documentsn")
   def search(self, query: str, k: int = 3) -> List[Dict]:
       query_vec = self.embedder.encode([query]).astype('float32')
       distances, indices = self.index.search(query_vec, k)
 Return to the Homepage [self.documents[i] for i in indices[0]]

The VectorStore class is designed to efficiently store and retrieve documents using FAISS-based search. The documents are embedded into a model using transformers and then an index built to allow for rapid retrieval. It allows us to retrieve relevant information for any given query. Take a look at the FULL CODES here.

Class QueryRouter
   def __init__(self):
       self.categories = {
           'technical': ['how', 'implement', 'code', 'function', 'algorithm', 'debug'],
           'factual': ['what', 'who', 'when', 'where', 'define', 'explain'],
           'comparative': ['compare', 'difference', 'versus', 'vs', 'better', 'which'],
           'procedural': ['steps', 'process', 'guide', 'tutorial', 'how to']
       }
   def route(self, query: str) -> str:
       query_lower = query.lower()
       scores = {}
       for category, keywords in self.categories.items():
           score = sum(1 for kw in keywords if kw in query_lower)
 ScoresAgenttic AI = score
       best_category = max(scores, key=scores.get)
 If scores, return the best_category[best_category] > 0 else 'factual'

The QueryRouter is a class that allows us to categorize queries based on their intent: technical, factual comparative or procedural. By using keyword matching, we determine the category that best matches your input. This step allows the retrieval to adapt dynamically in response to query types. See the FULL CODES here.

CLASS ANSWERGENDER:
   def __init__(self, model_name="google/flan-t5-base"):
       print(f"Loading generation model: {model_name}...")
       self.generator = pipeline('text2text-generation', model=model_name, device=0 if torch.cuda.is_available() Otherwise -1 (max_length=256).
       device_type = "GPU" if torch.cuda.is_available() You can also find out more about "CPU"
       print(f"✓ Generator ready (using {device_type})n")
   def generate(self, query: str, context: List[Dict], query_type: str) -> str:
       context_text = "nn".join([f"[{doc['source']}]: {doc['text']}" "Doc in context"
      
Context:
{context_text}


Question: {query}


Answer:"""
       answer = self.generator(prompt, max_length=200, do_sample=False)[0]['generated_text']
 Return answer.strip()
   def self_check(self, query: str, answer: str, context: List[Dict]) -> Tuple[bool, str]:
 If len (answer),

AnswerGenerator was built to create answers and evaluate them. We generate text answers based on retrieved documents using the Flan T5 model. We then perform a check to ensure that our answer is accurate and meaningful. Click here to see the FULL CODES here.

AgenticRAG class:
   def __init__(self):
       self.vector_store = VectorStore()
 Self.router = queryRouter()
 Self-generator = AnswerGenerator()
       self.max_iterations = 2
 Def add_knowledge() (self, list of documents)[str], sources: List[str]):
       self.vector_store.add_documents(documents, sources)
   def query(self, question: str, verbose: bool = True) -> Dict:
       if verbose:
           print(f"n{'='*60}")
           print(f"🤔 Query: {question}")
           print(f"{'='*60}")
       query_type = self.router.route(question)
       if verbose:
           print(f"📍 Route: {query_type.upper()} query detected")
       k_docs = {'technical': 2, 'comparative': 4, 'procedural': 3}.get(query_type, 3)
 Iteration = 0
 answer_accepted=False
 Iteration

All components are combined into AgenticRAG, a system that orchestrates routing and retrieval as well as generation, quality assurance, and checking. Iteratively, the system refines answers using feedback from its own evaluation. This includes adjusting queries or adding context as needed. The RAG is a decision tree that uses feedback to improve performance. See the FULL CODES here.

Def main():
   print("n" + "="*60)
   print("🚀 AGENTIC RAG WITH ROUTING & SELF-CHECK")
   print("="*60 + "n")
 Documents [
       "RAG (Retrieval-Augmented Generation) combines information retrieval with text generation. It retrieves relevant documents and uses them as context for generating accurate answers."
   ]
   sources = ["Python Documentation", "ML Textbook", "Neural Networks Guide", "Deep Learning Paper", "Transformer Architecture", "RAG Research Paper"]
   rag = AgenticRAG()
   rag.add_knowledge(documents, sources)
   test_queries = ["What is Python?", "How does machine learning work?", "Compare neural networks and deep learning"]
 Test_queries for query:
       result = rag.query(query, verbose=True)
       print(f"n{'='*60}")
       print(f"📊 FINAL RESULT:")
       print(f"   Answer: {result['answer']}")
       print(f"   Query Type: {result['query_type']}")
       print(f"   Iterations: {result['iterations']}")
       print(f"   Accepted: {result['accepted']}")
       print(f"{'='*60}n")
If the __name__ equals "__main__":
 The main reason for this is that()

Finalize the demonstration by running test queries and loading a knowledge base through the Agentic RAG Pipeline. We watch as the model refines and routes answers in a step-by-step manner, printing out intermediate results so that we can be transparent. We confirm at the end that our system delivers self-validated, accurate answers by using local computation.

We conclude by creating a fully-functional Agentic RAG Framework that retrieves and reasons its own answers. The system is able to dynamically route different types of queries, assess its responses and refine them by iteratively providing feedback. All this happens within a lightweight local environment. This exercise helps us to better understand RAG architectures, and we also get a feel for how agents can turn static retrieval systems intelligent self-improvers.

Click here to find out more FULL CODES here. Check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter Join our Facebook group! 100k+ ML SubReddit Subscribe Now our Newsletter. Wait! Are you using Telegram? now you can join us on telegram as well.

Asif Razzaq serves as the CEO at Marktechpost Media Inc. As an entrepreneur, Asif has a passion for harnessing Artificial Intelligence to benefit society. Marktechpost was his most recent venture. This platform, which focuses on machine learning and deep-learning news, is both technical and understandable to a broad audience. Over 2 million views per month are a testament to the platform’s popularity.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.

How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Mend Releases AI Safety Governance Framework: Masking Asset Stock, Danger Tiering, AI Provide Chain Safety, and Maturity Mannequin

Google DeepMind Introduces Decoupled DiLoCo: An Asynchronous Coaching Structure Attaining 88% Goodput Below Excessive {Hardware} Failure Charges

Mend.io releases AI Security Governance Framework covering asset inventory, risk tiering, AI Supply Chain Security and Maturity model

Anthropic Claims Pentagon Feud Cost It Billions

What Trump Didn’t Say About Nvidia Selling Chips To China

Anthropic Claude Cowork is an AI agent that actually works.

A toy AI exposed 50,000 logs of its chats with kids for anyone who has a Gmail account

The Sex I Had With AI Clive Owen

Top Insights

Alibaba Qwen introduces Qwen3 MT, a next-generation multilingual machine translation powered by reinforcement learning.

NVIDIA AI Presents ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Latest News

OpenAI Releases GPT-5.5, a Absolutely Retrained Agentic Mannequin That Scores 82.7% on Terminal-Bench 2.0 and 84.9% on GDPval

Your Favorite AI Gay Thirst Traps: The Men Behind them

How do you build an agentic decision-tree RAG system with intelligent query routing, self-checking and iterative refinement?

Related Posts