Remix.run Logo
yturijea 10 hours ago

I am using perhaps 15% of usage count on Claude with just the normal subscription. And I do full time software engineering and would say I use quite a lot of AI input on thoughts, designs and code drafts.

So how these companies and people manage to use these absurd amount of tokens is a mystery to me. It feels like this are just running huge amount of non-vetted data to the LLM's and or running loops against the LLM's which only produce fractional results if not wasted results for insane cost.

So really it is the equivalent of just burning money, or heating your house in the winter while having all your windows open.

PeterStuer 10 hours ago | parent | next [-]

Same for me, comfortable with a single Max sub, launching "claude --effort max" with Opus 4.8 (alas poor Fable, please come back!).

But try running Claude Opus at API prices through a 'clever' RAG based intermediate system 'managing' a 2024-era context size window completely unaligned with 2026 frontier model tool use expectations, that results in 100% cache miss and content coherency destruction on every single interaction. There's your typical 'Enterprise Agreement' GenAI setup.

I only really discovered this when trying to find out how my Enterprise friends' AI experiences were so completely opposite from my own successes as I could not believe how poor their results were even though on the surface it looked like we were using the same model, and I know they aren't 'bad' software engineers and developers.

raffael_de 3 hours ago | parent | prev | next [-]

coding harnesses loading entire code bases for every task - at least that's my theory because I also never even get close to the limits of my 20-sth bucks level subscriptions.

mschild 10 hours ago | parent | prev | next [-]

> So how these companies and people manage to use these absurd amount of tokens is a mystery to me.

Fire and forget. They run multiple agents in parallel 24/7. AI isn't just a rubber ducky for them, its their main (only) tool at that point.

oezi 10 hours ago | parent | prev | next [-]

If you don't reset sessions eagerly or compact regularly it is easy to consume billions in input tokens while Claude churns away.

auggierose 10 hours ago | parent | prev | next [-]

What is a "normal" subscription? Are you using Claude Code, or just Claude??

yturijea 8 hours ago | parent [-]

Just Claude, I have seen the weird hallucinations these LLM's make, yes that also means opus, fable etc. so I don't trust it to just run its own clause.

Yes that also means I get to inspect and confirm every step of the way, to ensure the design is followed, we are not making unneccesary changes, we have thought about edge cases, testing etc. And I also keep an understanding of what is produced, because I will manually copy it in, I will manually read through it. I will do secondary review of it myself in PR's whatever.

But I guess a lot of people just don't and just blow claude code through the roof on ad libitum infinity loop?

On subscription, I just checked, I have the Pro Plan, which for Claude I believe is the equal of the normal one?

auggierose 6 hours ago | parent [-]

I get where you are coming from (it's me, last year), and that's what I thought.

But if you use Claude Code or Codex, you will blow through your pro plan quickly. If you don't use them, you are not really using AI. I know how that sounds, but that is how it is.

These models are smart now. Really smart. Yes, they hallucinate, but usually not without reason. I am having long discussions with these models before generating code, and generate markdown from them. These are then the basis for the generated code. I am trying to give the model as much background as possible. I read the generated markdown: if there is something that feels off, like I don't really know what it means, then you need to fix that first, by discussing it with the model. Often, these are real problems in how I was understanding something, the model wasn't really getting it, and just made something up that it hoped would kinda work.

And I prefer Codex over Claude Code (prior to Fable, Fable is something else!), it behaves more like a helpful PhD-level colleague and just feels sharper. Claude Code sounds a bit like a mix between an HR person and a therapist that is on vacation too often.

I am still looking at code, but only if something came up during high-level discussions with the model that I want to pin down exactly. Otherwise I just talk about the high-level intention of the code, usually not looking at it.

What REALLY helps is coming up with the right theoretical frameworks for your work, with practical implementations that the model can use, and that allow some kind of verification. Let's say you want to parse something. For a one-off the model is great at generating "hand-rolled" parsing code, but for something disciplined, giving the model a way to generate context-free grammars and giving it a way to check them for determinism gives great results.

kamaal 9 hours ago | parent | prev | next [-]

>>So how these companies and people manage to use these absurd amount of tokens is a mystery to me.

Absolutely!

I know some colleagues who are routinely spending thousands of dollars worth of tokens, I can't see to even max out the subscription limits even if Im working all the time. Curiously enough their output is lower too.

szatkus 6 hours ago | parent [-]

Agents could work for a long time and burn a lot of tokens if you give them a task that is too hard for them. After enough time, if they don't give up, the slot machine could spit out a working solution.

Personally I find it faster to figure out the hard parts by myself and then give a few smaller tasks to Claude.

_pdp_ 10 hours ago | parent | prev [-]

I mean have you tried to tokenmax?

It is not that hard. Just launch 10 different windows and make sure to loop back in after every turn and you will be burning billions of tokens per month in no time.

kamaal 9 hours ago | parent [-]

The question is what work do you do, that burns so many tokens.

Are in you sending it to work in nested for loops? If yes, what sort of work would that be?