It's not the hardware getting cheaper, it's that LLMs were developed when we really didn't understand how they worked, and there is still some room to improve the implementations, particularly do more with less RAM... And that's everything from doing more with fewer weights to things like FP16, not to mention if you can 2x the speed you can get twice as much done with the same RAM and all the other parts.

▲

SecretDreams a day ago | parent [-]

Improvements in LLM efficiency should be driving hardware to get cheaper.

I agree with everything you've said, I'm just not seeing any material benefit to the statement as of now.

▲

sothatsit a day ago | parent [-]

Inference costs falling 2x doesn’t decrease hardware prices when demand for tokens has increased 10x.

	▲	PaulHoule a day ago \| parent [-]
		It's the ratio. If revenue goes up 10x you can afford 10x more hardware if you can afford to do it all.