Remix.run Logo
deweywsu 8 hours ago

With the pace of AI, and with AI helping to pave the way for faster/better AI, I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI. Huge AI models can be run with less resources already through quantization and offloading, but that's just the beginning. One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop. Think that's crazy? Look at the size of the first hard drives. The IBM 350 was a disk with 50 platters, 24 inches in diameter, that held 3.5Mb, and was leased for today's equivalent of $35K.

https://www.computerhistory.org/storageengine/first-commerci...

Compare that to a multi-terabyte ssd. Now apply that improvement to how an LLM is architected and run now. With AI assisting, it won't be long before a leap occurs and these data centers with all their current ultra-cutting edge Nvidia cards are nearly obsolete overnight.

admax88qqq 8 hours ago | parent | next [-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

But if you have such a breakthrough could you not also apply it and run 200T models on todays datacenters?

pennomi 8 hours ago | parent | next [-]

That assumes scaling laws still hold up. A bigger model might end up only incrementally more intelligent.

ACCount37 6 hours ago | parent [-]

They do. Mythos kicked ass while it lasted. And what we know of the scaling law curves promises us even more gains in the future.

"The future" being "whenever training and inference at increased scale becomes economical". Which is probably bounded by new generations of hardware, but might also be pushed forward by algorithmic advances.

phkahler 6 hours ago | parent [-]

I think they're out of training data though...

ACCount37 6 hours ago | parent [-]

Synthetics are often used for "data amplification" nowadays. Extra compute covers a multitude of sins.

ACCount37 7 hours ago | parent | prev | next [-]

Not only you could: you would also want to.

The likes of Mythos show that the scaling laws are real, and you can x5/x2 the total/active params and get meaningful gains. If "inference per param" gets cheaper? Up the params and get more intelligence for the same price.

deweywsu 8 hours ago | parent | prev [-]

Quite true

simonebrunozzi 8 hours ago | parent | prev | next [-]

Interesting comment, but the comparison with hard disk drives is probably unfair.

The IBM 350 was commercialized 70 years ago; it took 70 years for someone like you to be able to compare that to a multi-TB SSD.

Furthermore, nothing says that Moore's Law will necessarily apply to LLMs, for decades to come.

deweywsu 8 hours ago | parent [-]

Very true, and all I am basing my comment on is the improvement in speed AI has demonstrated when applied to software development, and inferring it might enable a similar 10X or 100X improvement in both hardware architecture as well LLM structure and/or interface methods. If that speed improvement applies to performance of AI, that could mean the 70 years it took for people to improve storage technology might be able to be compressed to achieve a step change in AI performance in a drastically shorter timeframe.

LZ_Khan 7 hours ago | parent | prev | next [-]

I think Jevons Paradox and scaling laws will make this not the case. If bigger models are always better (which seems they are), then will always need high-end hardware.

gdiamos 8 hours ago | parent | prev | next [-]

Usually breakthroughs in computing lead to more usage of computing, not less.

3abiton 7 hours ago | parent | prev | next [-]

> One day, maybe not far from now, a breakthrough will allow huge LLMs (say 200B in size) to run well on an old 5 year old Dell desktop.

I think there will be specialized hardware (beside GPUs) that would be custom made for LLMs. Yes TPUs exist, but mainly for datacenter. GPUs exist, but they are adapted from mainly graphic application. Once all the demand from data center dries up, innovation will kick in.

andriy_koval 8 hours ago | parent | prev | next [-]

> I keep wondering if hardware like this will become obsolete well before it has a meaningful ROI

it will build expertise/infra/know-how foundation for next generation of hardware

dwa3592 7 hours ago | parent | prev | next [-]

True but as someone else pointed out; at that time we'd be interested in running 200T parameter model rather than 200B. Why, you might ask? Law of human laziness - a human will become as lazy as the technology allows it to. With the 200T or 20,000 T model - I'd be heavily incentivized to ask it to make the bread for me that I enjoy making now or create a movie for me (featuring myself) which will maximize the dopamine production in my brain.

zabriel_goss 8 hours ago | parent | prev | next [-]

I agree with you. Stepping stones are still a part of getting there, if only to be briefly useful.

hyhatqtv 8 hours ago | parent | prev | next [-]

Looking at the development of memory bandwidth, capacity and prices over the last 10 years there is little indication that’s likely.

Rekindle8090 3 hours ago | parent | prev [-]

[dead]