| ▲ | blackbear_ 3 hours ago | |
The GPT3 paper is a good starting point Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 I also enjoyed the papers for DeepSeek and GLM for an overview of all the tricks you need to make these things work DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models https://arxiv.org/abs/2512.02556 GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models https://arxiv.org/abs/2508.06471 | ||