Remix.run Logo
desideratum 2 hours ago

This is a gross simplification of the process - you would typically use order(s) of magnitude more data and compute, and a substantial amount of online reinforcement learning to elicit emergent tool use capabilities.

Many recent OSS models have great tech reports where you can learn more about these kind of things: Kimi 2.5 https://github.com/MoonshotAI/Kimi-K2.5/blob/master/tech_rep... GLM 5 https://arxiv.org/abs/2602.15763 DeepSeek R1 https://arxiv.org/pdf/2501.12948