Remix.run Logo
zozbot234 4 days ago

> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.

It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago. For the very largest models, I think the latter effect dominates quite easily.

lelanthran 4 days ago | parent | next [-]

>> The companies that are entirely AI-dependent may need to raise prices dramatically as AI prices go up.

> It's not that clear. Sure, hardware prices are going up due to the extremely tight supply, but AI models are also improving quickly to the point where a cheap mid-level model today does what the frontier model did a year ago.

I agree; I got some coding value out of Qwen for $10/m (unlimited tokens); a nice harness (and some tight coding practices) lowers the distance between SOTA and 6mo second-tier models.

If I can get 80% of the way to Anthropic's or OpenAI's SOTA models using 10$/m with unlimited tokens, guess what I am going to do...

satvikpendem 4 days ago | parent [-]

GitHub Copilot is already $10 and I don't even use up the requests every month, it's the most bang for buck LLM service I've used.

chewz 4 days ago | parent [-]

Until May

kwakubiney 4 days ago | parent [-]

What’s happening in May?

chewz 4 days ago | parent [-]

Github Copilot switches all users from per prompt to per token billing

bcjdjsndon 4 days ago | parent | prev | next [-]

There's only so far engineers can optimise the underlying transformer technique, which is and always has been doing all the heavy lifting in the recent ai boom. It's going to take another genius to move this forward. We might see improvements here and there but the magnitudes of the data and vram requirements I don't think will change significantly

zozbot234 4 days ago | parent | next [-]

State space models are already being combined with transformers to form new hybrid models. The state-space part of the architecture is weaker in retrieving information from context (can't find a needle in the haystack as context gets longer, the details effectively get compressed away as everything has to fit in a fixed size) but computationally it's quite strong, O(N) not O(N^2).

aerhardt 4 days ago | parent | prev | next [-]

I’ve read and heard from Semi Analysis and other best-in-class analysts that the amount of software optimizations possible up and down the stack is staggering…

How do you explain that capabilities being equal, the cost per token is going down dramatically?

bcjdjsndon 3 days ago | parent [-]

Optimizations, like I said. They'll never hack away the massive memory requirements however, or the pre training... Imagine the memory requirements without the pre training step....this is just part and parcel of the transformer architecture.

bcjdjsndon 3 days ago | parent [-]

And a lot of these improvements are really just classic automation or chaining together yet more transformer architectures, to fix issues the transformer architecture creates in the first place (hallucinations, limited context)

abarth23 2 days ago | parent | prev [-]

Exactly this. To actually visualize the sheer scale of the VRAM wall we are hitting, I recently built an LLM VRAM estimator (bytecalculators.com/llm-vram-calculator).

If you play around with the math, you quickly realize that even if we heavily quantize models down to INT4 to save memory, simply scaling the context window (which everyone wants now) immediately eats back whatever VRAM we just saved. The underlying math is extremely unforgiving without fundamentally changing the architecture.

CodingJeebus 4 days ago | parent | prev | next [-]

You also have to look at how exposed your vendors are to cost increases as well.

Your company may have the resources to effectively shift to cheaper models without service degradation, but your AI tooling vendors might not. If you pay for 5 different AI-driven tools, that's 5 different ways your upstream costs may increase that you'll need to pass on to customers as well.

chewz 4 days ago | parent | prev [-]

We are processing same data for the last 2 years.

Inference prices droped like 90 percent in that time (a combination of cheaper models, implicit caching, service levels, different providers and other optimizations).

Quality went up. Quantity of results went up. Speed went up.

Service level that we provide to our clients went up massively and justfied better deals. Headcount went down.

What's not to like?

oeitho 4 days ago | parent | next [-]

The decline of independent thoughts for one. As people become reliant on LLMs to do their thinking for them and solve all problems that they stumble upon, they become a shell of their previous self.

Sadly, this is already happening.

WarmWash 4 days ago | parent | next [-]

We'll need to do faux mental work like how we do faux labor work.

chewz 4 days ago | parent | prev [-]

There is no decline. Human assets were always too expensive to process some additional information. We are simply processing lot more of low signal data.

Actually some of our analysts are empowered by the tools at their disposal. Their jobs are safe and necessary. Others were let go.

Clients are happy to get fuller picture of their universe, which drives more informed decissions . Everybody wins.

oeitho 4 days ago | parent | next [-]

You are free to believe what you want, but what you describe does not match what I’ve seen from society as a whole. I’m just going to leave this here: https://www.media.mit.edu/projects/your-brain-on-chatgpt/ove...

suttontom 3 days ago | parent | prev [-]

Are you being satirical?

bluecheese452 4 days ago | parent | prev [-]

The headcount that went down probably isn’t too thrilled about it.

chewz 4 days ago | parent [-]

Yes, probably. But the others gained skills and tools that made their jobs secure.

bluecheese452 4 days ago | parent [-]

Right but the question wasn’t were some people better off. It is what’s not to like?