Remix clone Hacker News

new | show | ask | jobs Github

	▲	psaccounts 12 hours ago
		This video tutorial provides an intuitive, in-depth breakdown of how an LLM learns language and uses that learning to generate text. Key concepts shown below are covered in a way that is both broad and deep, while still keeping the material accessible without losing technical rigor: * Historical context for LLMs and GenAI * Training an LLM -- 100K overview * What does an LLM learn during training? * Inferencing an LLM -- 100K overview * 3 steps in the LLM journey from pre-training to serving * Word Embeddings -- representing text in numeric format * RMS Normalization -- the sound engineer of the Transformer * Benefits of RMS Normalization over Layer Normalization * Rotary Position Encoding (RoPE) -- making the Transformer aware of token position * Masked Self-Attention -- making the Transformer understand context * How RoPE generalizes well making long-context LLMs possible * Understanding what Causal Masking is (intuition and benefit) * Multi-Head Attention -- improving stability of Self Attention * Residual Connections -- improving stability of learning * Feed Forward Network * SwiGLU Activation Function * Stacking * Projection Layer -- Next Token Prediction * Inferencing a Large Language Model * Step by Step next token generation to form sentences * Perplexity Score -- how well did the model does * Next Token Selector -- Greedy Sampling * Next Token Selector -- Top-k Sampling * Next Token Selector -- Top-p/Nucleus Sampling * Temperature -- making an LLM's generation more creative * Instruction finetuning -- aligning an LLM's response * Learning going forward