How to Create a Searchable AI Knowledgebase with OpenKB OpenRouter and LlamaDocs =

DOCS = {
   "transformer_architecture.md": textwrap.dedent("""
       # Transformer Architecture


 View Overview
 Attention Is All introduced the Transformer, a new deep-learning architecture.
       You Need" (Vaswani et al., 2017). The recurrent network was replaced by a
 The self-attention system allows for parallel training, and better long range.
       dependency modelling.


       ## Key Components
       - **Multi-Head Self-Attention**: Computes attention in h parallel heads, each
 Then, it concatenates its projections.
       - **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,
 Position-based application
       - **Positional Encoding**: Sinusoidal or learned embeddings that inject
         sequence-order information, since attention is permutation-invariant.
       - **Layer Normalisation**: Applied before (Pre-LN) or after (Post-LN) each
         sub-layer, stabilising gradients.
       - **Residual Connections**: Added around each sub-layer to ease gradient flow.


       ## Encoder vs Decoder
 This stack encodes input tokens in both directions (e.g. BERT).
 Decoder stack uses a causal (masked), attention to previous outputs.
 Cross-attention on encoder outputs, e.g. GPT, T5).


 # Scaling laws
       Kaplan et al. In 2020, Kaplan et al. showed that the model loss is predicted to decrease as power increases.
 The law of computation, parameter counting, and data. GPT-3 (175) and
       subsequent large language models.


 The ## limitations
 - The quadratic complexity of sequence length is O(n2)
       - No inherent recurrence -> long-context challenges
 High Memory Footprint during Training


 Refer to the following ## references
       Vaswani et al. (2017). Attention is All You Need. NeurIPS.
       Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
   """),


   "rag_systems.md": textwrap.dedent("""
 # Retrieval - Augmented Generation (RAG).


 ## Definition
 RAG is a generative LLM that includes a retrieval stage: Given a question, the relevant
 The prompt is prefixed with a document that has been retrieved from the corpus.
 The model is grounded in context that goes beyond the training data.


       ## Architecture
       1. **Indexing Phase** — Documents are chunked, embedded via a bi-encoder
          (e.g. text-embedding-3-large), and stored in a vector database (e.g.
          Faiss, Pinecone, Weaviate).
       2. **Retrieval Phase** — The user query is embedded; approximate nearest-
 The neighbour search (ANN) returns top-k chunks.
       3. **Generation Phase** — Retrieved chunks + query are passed to the LLM
 This is the synthesis of a final solution.


 The ## variants
       - **Dense Retrieval**: DPR, Contriever — queries and docs in the same space.
       - **Sparse Retrieval**: BM25 — term frequency-based, no embeddings needed.
       - **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
       - **Re-ranking**: A cross-encoder re-scores the top-k before the LLM sees them.


       ## Challenges
 Limits on the size of context windows: some long passages may be too large.
 - The retrieval quality sets a limit on the generation quality.
 Recall is affected by the Chunking technique.
       - Multi-hop questions require iterative retrieval (IRCoT, ReAct).


 Transformers and Relationships
 RAG systems are based on encoders that use transformers as decoders and embedders
       models for generation. It is directly determined by the quality of embedding models.
 Precision and recall of retrieval.


 Refer to the following ## references
       Lewis et al. (2020). RAG to Support Knowledge-Intensive Tasks in NLP NeurIPS.
       Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.
   """),


   "knowledge_graph_integration.md": textwrap.dedent("""
 # Knowledge Graphs & LLM Integratio


 What is a Knowledge Graph (Knowledge Graph)?
 The knowledge graph is an ordered graph with labels of the entities.
 Triples (subject-predicate-object) are called relations.
       (Vaswani, authored, "Attention Is All You Need").


 Why combine KGs and LLMs?
 LLMs hallucinate fact; KGs offer structured and verifiable truth.
 LLMs can provide an interface to search KGs in natural languages.
 They enable a faithful, well-founded, and explainable answer to questions.


 Integrating Strategies
 ### KGAG - KG-Augmented Generator
 Serialise text into triples and subgraphs rather than text chunks.
 Then, feed the LLM prompt.


 Construction of ### KGs with LLM Assistance
 LLMs can extract triples (subject, relationship, object) from text that is not structured.
 Reduce manual curation efforts by a significant amount.


 ### GraphRAG Graph (Microsoft Research 2024).
 GraphRAG Clusters generate community summaries and documents communities.
 They are stored in a KG. Questions answered using map-reduce on community summaries
 Compared to RAG with a flat vector, the RAG performs better on tasks requiring sensemaking.


       ## Challenges
 KG Construction quality is dependent on the accuracy of extraction LLM.
 Graph databases increase infrastructure complexity.
       - Ontology design requires domain expertise.
 KGs can become stale if you don't have a pipeline for continuous updates.


 Relationship to RAG, Transformers
 KG Integration addresses two RAG limitations that are key: Lack of structured reasoning
 Unable to keep track of multi-hop connections.


 Refer to the following ## references
       Pan et al. (2023). Unifying LLMs & KGs. IEEE Intelligent Systems.
   """),
}

How to Create a Searchable AI Knowledgebase with OpenKB OpenRouter and LlamaDocs =

Cursor Releases TypeScript-based SDKs for Building Coding Agents with Sandboxed Cloud Virtual Machines, Subagents Hooks and Token Based Pricing

The smol Audio Notebook: An Adaptive Collection of Notebooks for Whisper, Parakeet Voxtral Granite Speech and Audio Flamingo 3.

Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs

The Top 10 Compression techniques for LLM inference using KV cache: Reduced memory overhead across evictions, low-rank methods, and quantization

OpenAI locks down San Francisco offices following an alleged threat from a militant

This is a ridiculously geeky bet that could bring in billions

Anthropic Responds to US Military’s Labeling of It as a Supply Chain Risk

How much energy does AI use? The people who are in the know won’t say anything

Former top Google Researchers have made a new type of AI agent

Top Insights

Code Implementation For An Agentic AI Framework That Performs Literature Analysis. Hypothesis Generation. Experimental Planning. Simulation. And Scientific Reporting

Google AI Introduces Supervised Reinforcement Learning, a Framework that Uses Expert Trajectories and Step-Wise Frameworks for Teaching Small Language Models How to Solve Hard Problems

Latest News

Cursor Releases TypeScript-based SDKs for Building Coding Agents with Sandboxed Cloud Virtual Machines, Subagents Hooks and Token Based Pricing

The smol Audio Notebook: An Adaptive Collection of Notebooks for Whisper, Parakeet Voxtral Granite Speech and Audio Flamingo 3.

How to Create a Searchable AI Knowledgebase with OpenKB OpenRouter and LlamaDocs =

Related Posts