> At some point a beefy Mac Studio and the "right sized" model is going to be what people want.

It's pretty clear that this isn't going to happen any time soon, if ever. You can't shrink the models without destroying their coherence, and this is a consistently robust observation across the board.

▲

sipjca 6 hours ago | parent [-]

I don’t think it’s about literally shrinking the models via quantization, but rather training smaller/more efficient models from scratch

Smaller models have gotten much more powerful the last 2 years. Qwen 3.5 is one example of this. The cost/compute requirements of running the same level intelligence is going down

▲

HerbManic 4 hours ago | parent | next [-]

I have said for a while that we need a sort of big-little-big model situation.

The inputs are parsed with a large LLM. This gets passed on to a smaller hyper specific model. That outputs to a large LLM to make it readable.

Essentially you can blend two model type. Probabilistic Input > Deterministic function > Probabilistic Output. Have multiple little determainistic models that are choose for specific tasks. Now all of this is VERY easy to say, and VERY difficult to do.

But if it could be done, it would basically shrink all the models needed. Don't need a huge input/output model if it is more of an interpreter.

▲

kyboren 4 hours ago | parent | prev [-]

Yes, but bigger models are still more capable. Models shrinking (iso-performance) just means that people will train and use more capable models with a longer context.

	▲	sipjca 2 hours ago \| parent [-]
		Of course they are! Both are important and will be around and used for different reasons