I think the issue is for them "quality, not speed" means "expensive, not cheap" and they can't pass that extra cost on to customers

▲

tejohnso 16 hours ago | parent | next [-]

> they can't pass that extra cost on to customers

I don't understand why not. People pay for quality all the time, and often they're begging to pay for quality, it's just not an option. Of course, it depends on how much more quality is being offered, but it sounds like a significant amount here.

▲

mccoyb 21 hours ago | parent | prev | next [-]

I'm happy to pay the same right now for less (on the max plan, or whatever) -- because I'm never running into limits, and I'm running these models near all day every day (as a single user working on my own personal projects).

I consistently run into limits with CC (Opus 4.5) -- but even though Codex seems to be spending significantly more tokens, it just seems like the quota limit is much higher?

▲

Computer0 21 hours ago | parent | next [-]

I am on the $20 plan for CC and Codex, I feel like a session of usage on CC == ~20% Codex usage / 5 hours in terms of time spent inferencing. It has always seemed way more geneous than I would expect.

▲

Aurornis 19 hours ago | parent [-]

Agreed. The $20 plans can go very far when you're using the coding agent as an additional tool in your development flow, not just trying to hammer it with prompts until you get output that works.

Managing context goes a long way, too. I clear context for every new task and keep the local context files up to date with key info to get the LLM on target quickly

▲

girvo 18 hours ago | parent | next [-]

> I clear context for every new task and keep the local context files up to date with key info to get the LLM on target quickly

Aggressively recreating your context is still the best way to get the best results from these tools too, so it has a secondary benefit.

▲

heliumtera 16 hours ago | parent [-]

It is ironic that in the gpt-4 era, when we couldn't see much value in this tools, all we could hear was "skill issues", "prompt engineering skills". Now they are actually quite capable for SOME tasks, specially for something that we don't really care about learning, and they, to a certain extent, can generalize. They perform much better than in gpt-4 era, objectively, across all domains. They perform much better with the absolute minimum input, objectively, across all domains. If someone skipped the whole "prompt engineering" and learned nothing during that time, this person is more equiped to perform well. Now I wonder how much I am leaving behind by ignoring this whole "skills, tools, MCP this and that, yada yada".

	▲	conradev 13 hours ago \| parent \| next [-]
		Prompt engineering (communicating with models?) is a foundational skill. Skills, tools, MCPs, etc. are all built on prompts. My take is that the overlap is strongest with engineering management. If you can learn how to manage a team of human engineers well, that translates to managing a team of agents well.
	▲	miek 10 hours ago \| parent \| prev \| next [-]
		Minimal prompting yielding better results? I haven't found this to be the case at all.
	▲	neom 15 hours ago \| parent \| prev \| next [-]
		Any thoughts on your wondering? I too am wondering about the same mistake I might be making.
	▲	fragmede 4 hours ago \| parent \| prev [-]
		My answer is that the code they generate is still crap, so the new skill is in being able to spot the ways and places it wrote crap code, and how to quickly tell it to refactor to fix specific issues, and still come out ahead on productivity. Nothing like an ultra wide screen monitor (LG 40+) and having parallel codex or claude sessions going, working on a bunch of things at once in parallel. Get good at git worktree. Use them to make tools that make your own life easier that you previously wouldn't even have bothered to make. (chrome extensions and MCPs!) The other skill is in knowing exactly when to roll up your sleeves and do it the old fashioned way. Which things they're good/useful for, and which things they aren't.

▲

theonething 17 hours ago | parent | prev [-]

do you mean running /compact often?

▲

Aurornis 2 hours ago | parent | next [-]

If I want to continue the same task, I run /compact

If I want to start a new task, I /clear and then tell it to re-read the CLAUDE.md document where I put all of the quick context: Description of the project, key goals, where to find key code, reminders for tools to use, and so on. I aggressively update this file as I notice things that it’s always forgetting or looking up. I know some people have the LLM update their context file but I just do it myself with seemingly better results.

Using /compact burns through a lot of your usage quota and retains a lot of things you may not need. Giving it directions like “starting a new task doing ____, only keep necessary context for that” can help, but hitting /clear and having it re-read a short context primer is faster and uses less quota.

▲

dionian 10 hours ago | parent | prev [-]

I'm not who you asked, but i do the same thing, i keep important state in doc files and recreate sessions from that state. this allows me to clear context and reconstruct my status on that item. I have a skill that manages this

	▲	joquarky 9 hours ago \| parent [-]
		Using documents for state helps so much with adding guardrails. I do wish that ChatGPT had a toggle next to each project file instead of having to delete and reupload to toggle or create separate projects for various combinations of files.

▲

hadlock 16 hours ago | parent | prev | next [-]

I noticed I am not hitting limits either. My guess is OpenAI sees CC as a real competitor/serious threat. Had OAI not given me virtually unlimited use I probably would have jumped ship to CC by now. Burning tons of cash at this stage is likely Very Worth It to maintain "market leader" status if only in the eyes of the media/investors. It's going to be real hard to claw back current usage limits though.

▲

andai 19 hours ago | parent | prev [-]

If you look at benchmarks, the Claude models score significantly higher intelligence per token. I'm not sure how that works exactly, but they are offset from the entire rest of the chart on that metric. It seems they need less tokens to get the same result. (I can't speak for how that affects performance on very difficult tasks though, since most of mine are pretty straightforward.)

So if you look at the total cost of running the benchmark, it's surprisingly similar to other models -- the higher price per token is offset by the significantly fewer tokens required to complete a task.

See "Cost to Run Artificial Analysis Index" and "Intelligence vs Output Tokens" here

https://artificialanalysis.ai/

...With the obligatory caveat that benchmarks are largely irrelevant for actual real world tasks and you need to test the thing on your actual task to see how well it does!

▲

golly_ned 18 hours ago | parent | prev | next [-]

I wonder how much their revenue really ends up contributes towards covering their costs.

In my mind, they're hardly making any money compared to how much they're spending, and are relying on future modeling and efficiency gains to be able to reduce their costs but are pursuing user growth and engagement almost fully -- the more queries they get, the more data they get, the bigger a data moat they can build.

▲

erik 16 hours ago | parent | next [-]

Inference is almost certainly very profitable.

All the money they keep raising goes to R&D for the next model. But I don't see how they ever get off that treadmill.

▲

mbesto 2 hours ago | parent | next [-]

> Inference is almost certainly very profitable.

It almost certainly is not. Until we know what the useful life of NVIDIA GPUs are, then it's impossible to determine whether this is profitable or not.

	▲	panarky 21 minutes ago \| parent [-]
		The depreciation schedule isn't as big a factor as you'd think. The marginal cost of an API call is small relative to what users pay, and utilization rates at scale are pretty high. You don't need perfect certainty about GPU lifespan to see that the spread between cost-per-token and revenue-per-token leaves a lot of room. And datacenter GPUs have been running inference workloads for years now, so companies have a good idea of rates of failure and obsolescence. They're not throwing away two-year-old chips.

▲

ithkuil 8 hours ago | parent | prev [-]

Is there a possible future where the inference usage increases because there will be many many more customers and R&D grows Lower than inference?

Or is it already saturated?

▲

nimchimpsky 18 hours ago | parent | prev [-]

"In my mind, they're hardly making any money compared to how much they're spending"

everyone seems to assume this, but its not like its a company run by dummies, or has dummy investors.

They are obviously making awful lot of revenue.

▲

alwillis 11 hours ago | parent | next [-]

>> "In my mind, they're hardly making any money compared to how much they're spending"

> everyone seems to assume this, but its not like its a company run by dummies, or has dummy investors.

It has nothing to do with their management or investors being "dummies" but the numbers are the numbers.

OpenAI has data center rental costs approaching $620 billion, which is expected to rise to $1.4 trillion by 2033.

Annualized revenue is expected to be "only" $20 billion this year.

$1.4 trillion is 70x current revenue.

So unless they execute their strategy perfectly, hit all of their projections and hoping that neither the stock market or economy collapses, making a profit in the foreseeable future is highly unlikely.

[1]: "OpenAI's AI money pit looks much deeper than we thought. Here's my opinion on why this matters" - https://diginomica.com/openais-ai-money-pit-much-deeper-we-t...

▲

Daneel_ 17 hours ago | parent | prev | next [-]

To me it seems that they're banking on it becoming indispensable. Right now I could go back to pre-AI and be a little disappointed but otherwise fine. I figure all of these AI companies are in a race to make themselves part of everyone's core workflow in life, like clothing or a smart phone, such that we don't have much of a choice as to whether we use it or not - it just IS.

That's what the investors are chasing, in my opinion.

	▲	zozbot234 16 hours ago \| parent [-]
		It'll never be literally indispensible, because open models exist - either served by third-party providers, or even ran locally in a homelab setup. A nice thing that's arguably unique about the latter is that you can trade scale for latency - you get to run much larger models on the same hardware if they can chug on the answer overnight (with offload to fast SSD for bulk storage of parameters and activations) instead of just answering on the spot. Large providers don't want to do this, because keeping your query's activations around is just too expensive when scaled to many users.

▲

mbesto 2 hours ago | parent | prev | next [-]

> They are obviously making awful lot of revenue.

It's not hard to sell $10 worth of products if you spend $20. profit is more important than revenue.

▲

troupo 17 hours ago | parent | prev [-]

Revenue != profit.

They are drowning in debt and go into more and more ridiculous schemes to raise/get more money.

--- start quote ---

OpenAI has made $1.4 trillion in commitments to procure the energy and computing power it needs to fuel its operations in the future. But it has previously disclosed that it expects to make only $20 billion in revenues this year. And a recent analysis by HSBC concluded that even if the company is making more than $200 billion by 2030, it will still need to find a further $207 billion in funding to stay in business.

https://finance.yahoo.com/news/openai-partners-carrying-96-b...

--- end quote ---

▲

zozbot234 17 hours ago | parent | prev [-]

The "quality" model can cost $200/month. They'll be fine.