| ▲ | Show HN: A book that builds GPT-2, Llama 3, DeepSeek from scratch in PyTorch | |
| 2 points by s1lv3rj1nx 5 hours ago | 1 comments | ||
I'm a software engineer who works with LLMs professionally (Forward Deployed Engineer at TrueFoundry). Over the past year I built up implementations of five LLM architectures from scratch and wrote a book around them. The progression: - Ch1: Vanilla encoder-decoder transformer (English to Hindi translation) - Ch2: GPT-2 124M from scratch, loads real OpenAI pretrained weights - Ch3: Llama 3.2-3B by swapping 4 components of GPT-2 (LayerNorm to RMSNorm, learned PE to RoPE, GELU to SwiGLU, MHA to GQA), loads Meta's pretrained weights - Ch4: KV cache, MQA, GQA (inference optimisation) - Ch5: DeepSeek MLA (absorption trick, decoupled RoPE), DeepSeekMoE, Multi-Token Prediction, FP8 quantisation All code is open source: https://github.com/S1LV3RJ1NX/mal-code The book provides the explanations, derivations, diagrams, and narrative: https://leanpub.com/adventures-with-llms (free sample available) I wrote it because most resources stop at GPT-2 and I wanted something that covered what's actually in production models today. Happy to answer questions about any of the implementations. | ||
| ▲ | s1lv3rj1nx 2 hours ago | parent [-] | |
AMA | ||