What Are LLMs?
Neural network simulations of human data labelers following instructions — not actual reasoning systems (for most models)
How They're Built
Stage 1: Pre-Training
- Download & filter internet data (~44TB from sources like Common Crawl)
- Filter out malware, spam, adult content, PII
- Extract just text, remove HTML/CSS markup
- Result: massive dataset of high-quality, diverse documents
Stage 2: Fine-Tuning
- Human data labelers create ideal responses to prompts
- Model learns to imitate these responses
- RLHF (Reinforcement Learning from Human Feedback) - helps align outputs but isn't true RL
Stage 3: Reinforcement Learning (Thinking Models Only)
- Models like O3 develop novel problem-solving strategies
- Practice on curated problems to perfect reasoning
- Can potentially discover solutions humans haven't thought of
Critical Limitations ("Swiss Cheese Model")
- Hallucinations - confidently generate false information
- Can't count letters or do basic arithmetic reliably
- Arbitrary failures - randomly struggles with simple tasks (e.g., 9.11 vs 9.9)