▲ | dragonwriter 4 days ago | |||||||||||||
> I'm skeptical about these claims. How can this be? More efficient architecture. > Wouldn't there be massive loss of world knowledge? If you assume equally efficient architecture and no other salient differences, yes, that’s what you’d expect from a smaller model. | ||||||||||||||
▲ | jug 4 days ago | parent [-] | |||||||||||||
Hmm. Let's just say if this is true, that this is actually better with such a much lower total parameter count, it's the greatest accomplishment in over a year of LLM development. With the backdrop of bechmaxxing in 2025, I'll believe in this when I see the results on closed benchmarks and SimpleBench. My concern is this might be a hallucination machine. | ||||||||||||||
|