Large Language Models Cheatsheet | Notion

What Are LLMs?

Neural network simulations of human data labelers following instructions — not actual reasoning systems (for most models)

How They're Built

Stage 1: Pre-Training

Download & filter internet data (~44TB from sources like Common Crawl)
Filter out malware, spam, adult content, PII
Extract just text, remove HTML/CSS markup
Result: massive dataset of high-quality, diverse documents

Stage 2: Fine-Tuning

Human data labelers create ideal responses to prompts
Model learns to imitate these responses
RLHF (Reinforcement Learning from Human Feedback) - helps align outputs but isn't true RL

Stage 3: Reinforcement Learning (Thinking Models Only)

Models like O3 develop novel problem-solving strategies
Practice on curated problems to perfect reasoning
Can potentially discover solutions humans haven't thought of

Critical Limitations ("Swiss Cheese Model")

Hallucinations - confidently generate false information
Can't count letters or do basic arithmetic reliably
Arbitrary failures - randomly struggles with simple tasks (e.g., 9.11 vs 9.9)