| ▲ | hellohello2 4 hours ago | |||||||||||||||||||||||||||||||||||||||||||
"It is almost guaranteed that a 60-90B model can outperform current SOTA in coding tasks within 2-3 years" What insight do you have to make this claim? | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | roadside_picnic 4 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||||||||
Have you personally used any of the latest batch of even smaller local models? They certainly don't beat SotA models at coding... but with a good harness they are able to achieve things with SotA that I couldn't last year. I've repeatedly given local models non-trivial projects that involve research and coding which they've successfully completed with minimal intervention from me (almost exclusively in the domain of reviewing the results). Again, nothing comparable with current SotA, but definitely tasks I could not have given SotA models last year (without agent harness). Now that pure progress from these models seems to have slowed down, we're seeing a ton of options for both making models more efficient and other tools that help improve them (everything from agent harnesses to RLVR). That's just looking at "what can small do today", when you look at what's possible with larger open models that are still much smaller than SotA from the major providers, their performance is extremely close to SotA, enough that for personal projects I'll just use Kimi instead of any anthropic offerings. So it's not terribly hard to image a solution in the middle happening within a few years. We still have tons to learn about optimal sizes of these models and how to build them with maximal efficiency (and we've already seen a lot of recent improvements in this space). | ||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | onlyrealcuzzo 4 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||||||||
1. Context is all you need... They are heavily investing in getting better context (especially for coding tasks). This will disproportionately advantage smaller models (and benefit everyone). A smaller model with better context today can outperform a model with 100x more parameters with bad or diluted context. 2. MoE (already abundant) + MLA (mostly memory efficiency, not quality) + Medusa (speed, not quality) + GRAM (5000-10,000x better reasoning in an extremely small model) + 1.58b (unclear if it will have the impact Microsoft first claimed - but possibly 5x). | ||||||||||||||||||||||||||||||||||||||||||||
| ▲ | knollimar 4 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||||||||
Probably just "gemma was cool" | ||||||||||||||||||||||||||||||||||||||||||||