| ▲ | kennywinker 6 hours ago | |||||||||||||||||||||||||
Can anyone explain what’s the story here? Is this just a re-skinned qwen? Who is deepreinforce-ai and why isn’t this model listed on their website? How does it self-improve, does the model change on disk - or just during a single context run it gets better? | ||||||||||||||||||||||||||
| ▲ | simonw 6 hours ago | parent | next [-] | |||||||||||||||||||||||||
It doesn't self-improve, that's a misleading headline. As far as I can tell they trained it by running their own reinforcement learning on top of Qwen and Gemma 4 (not sure how they combined weights from both, or if they used Qwen as the basis and Gemma 4 to help train?) - so the "self-improving" is about their training process, not how you use the weights. | ||||||||||||||||||||||||||
| ||||||||||||||||||||||||||
| ▲ | v3ss0n 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||
Clickbait title. | ||||||||||||||||||||||||||
| ▲ | 6 hours ago | parent | prev [-] | |||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||