DOCS = {
"transformer_architecture.md": textwrap.dedent("""
# Transformer Architecture
View Overview
Attention Is All introduced the Transformer, a new deep-learning architecture.
You Need" (Vaswani et al., 2017). The recurrent network was replaced by a
The self-attention system allows for parallel training, and better long range.
dependency modelling.
## Key Components
- **Multi-Head Self-Attention**: Computes attention in h parallel heads, each
Then, it concatenates its projections.
- **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,
Position-based application
- **Positional Encoding**: Sinusoidal or learned embeddings that inject
sequence-order information, since attention is permutation-invariant.
- **Layer Normalisation**: Applied before (Pre-LN) or after (Post-LN) each
sub-layer, stabilising gradients.
- **Residual Connections**: Added around each sub-layer to ease gradient flow.
## Encoder vs Decoder
This stack encodes input tokens in both directions (e.g. BERT).
Decoder stack uses a causal (masked), attention to previous outputs.
Cross-attention on encoder outputs, e.g. GPT, T5).
# Scaling laws
Kaplan et al. In 2020, Kaplan et al. showed that the model loss is predicted to decrease as power increases.
The law of computation, parameter counting, and data. GPT-3 (175) and
subsequent large language models.
The ## limitations
- The quadratic complexity of sequence length is O(n2)
- No inherent recurrence -> long-context challenges
High Memory Footprint during Training
Refer to the following ## references
Vaswani et al. (2017). Attention is All You Need. NeurIPS.
Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.
"""),
"rag_systems.md": textwrap.dedent("""
# Retrieval - Augmented Generation (RAG).
## Definition
RAG is a generative LLM that includes a retrieval stage: Given a question, the relevant
The prompt is prefixed with a document that has been retrieved from the corpus.
The model is grounded in context that goes beyond the training data.
## Architecture
1. **Indexing Phase** — Documents are chunked, embedded via a bi-encoder
(e.g. text-embedding-3-large), and stored in a vector database (e.g.
Faiss, Pinecone, Weaviate).
2. **Retrieval Phase** — The user query is embedded; approximate nearest-
The neighbour search (ANN) returns top-k chunks.
3. **Generation Phase** — Retrieved chunks + query are passed to the LLM
This is the synthesis of a final solution.
The ## variants
- **Dense Retrieval**: DPR, Contriever — queries and docs in the same space.
- **Sparse Retrieval**: BM25 — term frequency-based, no embeddings needed.
- **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.
- **Re-ranking**: A cross-encoder re-scores the top-k before the LLM sees them.
## Challenges
Limits on the size of context windows: some long passages may be too large.
- The retrieval quality sets a limit on the generation quality.
Recall is affected by the Chunking technique.
- Multi-hop questions require iterative retrieval (IRCoT, ReAct).
Transformers and Relationships
RAG systems are based on encoders that use transformers as decoders and embedders
models for generation. It is directly determined by the quality of embedding models.
Precision and recall of retrieval.
Refer to the following ## references
Lewis et al. (2020). RAG to Support Knowledge-Intensive Tasks in NLP NeurIPS.
Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.
"""),
"knowledge_graph_integration.md": textwrap.dedent("""
# Knowledge Graphs & LLM Integratio
What is a Knowledge Graph (Knowledge Graph)?
The knowledge graph is an ordered graph with labels of the entities.
Triples (subject-predicate-object) are called relations.
(Vaswani, authored, "Attention Is All You Need").
Why combine KGs and LLMs?
LLMs hallucinate fact; KGs offer structured and verifiable truth.
LLMs can provide an interface to search KGs in natural languages.
They enable a faithful, well-founded, and explainable answer to questions.
Integrating Strategies
### KGAG - KG-Augmented Generator
Serialise text into triples and subgraphs rather than text chunks.
Then, feed the LLM prompt.
Construction of ### KGs with LLM Assistance
LLMs can extract triples (subject, relationship, object) from text that is not structured.
Reduce manual curation efforts by a significant amount.
### GraphRAG Graph (Microsoft Research 2024).
GraphRAG Clusters generate community summaries and documents communities.
They are stored in a KG. Questions answered using map-reduce on community summaries
Compared to RAG with a flat vector, the RAG performs better on tasks requiring sensemaking.
## Challenges
KG Construction quality is dependent on the accuracy of extraction LLM.
Graph databases increase infrastructure complexity.
- Ontology design requires domain expertise.
KGs can become stale if you don't have a pipeline for continuous updates.
Relationship to RAG, Transformers
KG Integration addresses two RAG limitations that are key: Lack of structured reasoning
Unable to keep track of multi-hop connections.
Refer to the following ## references
Pan et al. (2023). Unifying LLMs & KGs. IEEE Intelligent Systems.
"""),
}
Trending
- Cursor Releases TypeScript-based SDKs for Building Coding Agents with Sandboxed Cloud Virtual Machines, Subagents Hooks and Token Based Pricing
- The smol Audio Notebook: An Adaptive Collection of Notebooks for Whisper, Parakeet Voxtral Granite Speech and Audio Flamingo 3.
- Elon Musk squeezed OpenAI, and they’re ‘going to want to kill me’
- Taylor Swift is attempting to trademark her image. TikTok Deepfake Advertisements Show Why
- Qwen Team Releases FlashQLA: a High-Performance Linear Attention Kernel Library That Achieves Up to 3× Speedup on NVIDIA Hopper GPUs
- Emergency First Responders Say Waymos Are Getting Worse
- Google One, YouTube and YouTube One drive Google 25M subscribers in Q1
- The Top 10 Compression techniques for LLM inference using KV cache: Reduced memory overhead across evictions, low-rank methods, and quantization

