What Are LLMs?
Neural network simulations of human data labelers following instructions — not actual reasoning systems (for most models)
How They're Built
Stage 1: Pre-Training
- Download & filter internet data (~44TB from sources like Common Crawl)
 
- Filter out malware, spam, adult content, PII
 
- Extract just text, remove HTML/CSS markup
 
- Result: massive dataset of high-quality, diverse documents
 
Stage 2: Fine-Tuning
- Human data labelers create ideal responses to prompts
 
- Model learns to imitate these responses
 
- RLHF (Reinforcement Learning from Human Feedback) - helps align outputs but isn't true RL
 
Stage 3: Reinforcement Learning (Thinking Models Only)
- Models like O3 develop novel problem-solving strategies
 
- Practice on curated problems to perfect reasoning
 
- Can potentially discover solutions humans haven't thought of
 
Critical Limitations ("Swiss Cheese Model")
- Hallucinations - confidently generate false information
 
- Can't count letters or do basic arithmetic reliably
 
- Arbitrary failures - randomly struggles with simple tasks (e.g., 9.11 vs 9.9)