Remix.run Logo
mikestorrent 3 days ago

Inference performance per watt is continuing to improve, so even if we hit the peak of what LLM technology can scale to, we'll see tokens per second, per dollar, and per watt continue to improve for a long time yet.

I don't think we're hitting peak of what LLMs can do, at all, yet. Raw performance for one-shot responses, maybe; but there's a ton of room to improve "frameworks of thought", which are what agents and other LLM based workflows are best conceptualized as.

The real question in my mind is whether we will continue to see really good open-source model releases for people to run on their own hardware, or if the companies will become increasingly proprietary as their revenue becomes more clearly tied up in selling inference as a service vs. raising massive amounts of money to pursue AGI.

ethbr1 2 days ago | parent [-]

My guess would be that it parallels other backend software revolutions.

Initially, first party proprietary solutions are in front.

Then, as the second-party ecosystem matures, they build on highest-performance proprietary solutions.

Then, as second parties monetize, they begin switching to OSS/commodity solutions to lower COGS. And with wider use, these begin to outcompete proprietary solutions on ergonomics and stability (even if not absolute performance).

While Anthropic and OpenAi are incinerating money, why not build on their platforms? As soon as they stop, scales tilt towards an apache/nginx type commoditized backend.