| ▲ | konaraddi 5 days ago |
| > applying this compression algorithm at scale may significantly relax the memory bottleneck issue. I don’t think they’re going to downsize though, I think the big players are just going to use the freed up memory for more workflows or larger models because the big players want to scale up. It’s a cat and mouse race for the best models. |
|
| ▲ | miohtama 5 days ago | parent | next [-] |
| It will also help with local inference, making AI without big players possible. |
| |
| ▲ | otabdeveloper4 5 days ago | parent [-] | | It's already possible. Post-training is vastly more important than model size. (There's bigtime diminishing returns with increasing model size.) | | |
| ▲ | plagiarist 5 days ago | parent [-] | | Is there a size cutoff you would say where diminishing returns really kick in? My experience doesn't disagree, at least. I've been using Qwen for coding locally a bit. It is much better than I thought it would be. But also still falls short in some obvious ways compared to the frontiers. | | |
| ▲ | otabdeveloper4 4 days ago | parent [-] | | > Is there a size cutoff you would say where diminishing returns really kick in? No idea yet. But also it's obvious that making LLMs without MoE is stupid. |
|
|
|
|
| ▲ | Verdex 5 days ago | parent | prev [-] |
| Known in the business as 'pulling a jevons' |