Remix.run Logo
Grosvenor an hour ago

Could this generate pressure to produce less memory hungry models?

hodgehog11 an hour ago | parent | next [-]

There has always been pressure to do so, but there are fundamental bottlenecks in performance when it comes to model size.

What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur.

thisrobot 9 minutes ago | parent | next [-]

I wonder if this maintains the natural language capabilities which are what LLM's magic to me. There is a probably some middle ground, but not having to know what expressions, or idiomatic speech an LLM will understand is really powerful from a user experience point of view.

Grosvenor 36 minutes ago | parent | prev | next [-]

Yeah that was my unspoken assumption. The pressure here results in an entirely different approach or model architecture.

If openAI is spending $500B then someone can get ahead by spending $1B which improves the model by >0.2%

I bet there's a group or three that could improve results a lot more than 0.2% with $1B.

UncleOxidant 36 minutes ago | parent | prev | next [-]

Or maybe models that are much more task-focused? Like models that are trained on just math & coding?

parineum 31 minutes ago | parent | prev [-]

> so that the model isn't required to compress a large proportion of the internet into their weights.

The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.

lofaszvanitt 41 minutes ago | parent | prev [-]

Of course and then watch those companies reined in.