Fast AI seems genuinely exciting and somewhat unsettling to me. Right now Claude is faster than me on some tasks but we’re at least close. I have a prompt to clean up a PR that’s been running for 1h now and I expect it to take another few. It’s hard to imagine how the workflow would look like if it was near-instant. On the one hand, it might be easier to focus. Some prompts take so long that I start to multitask and regret it later. On the other, AI that takes a few seconds to max few minutes to solve what used to take hours or days? That’s a game changer and I don’t even know where we fit in.

▲

flexagoon 2 hours ago | parent | next [-]

I'm using Deepseek-v4-pro as my main model and this is sometimes pretty annoying, I have to do some easy boring task, think "I'll just leave the agent to do it and go take a nap", but it's already done writing the code before I even walk away from the computer

▲

throwaway67678 an hour ago | parent | next [-]

Agent mania setting in

It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour

▲

throw1234567891 5 minutes ago | parent | next [-]

It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.

▲

smith7018 an hour ago | parent | prev [-]

I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.

	▲	leodavi 38 minutes ago \| parent \| next [-]
		I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs. Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates. I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.
	▲	dizhn 18 minutes ago \| parent \| prev \| next [-]
		All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.
	▲	AgentMasterRace 28 minutes ago \| parent \| prev [-]
		All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.

▲

RussianCow 2 hours ago | parent | prev | next [-]

Do you mean Flash and not Pro? I haven't tried it personally, but according to OpenRouter, the fastest DeekSeep V4 Pro providers are only ~50tps. That's slower than Claude Opus.

https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...

	▲	sarjann an hour ago \| parent \| next [-]
		I don't think token speed matters as much when a lot of tokens are needed to achieve a task. E.g. artificial analysis benchmarks where deepseek v4 is one of the biggest token burners to go through the benchmark.
	▲	specproc 2 hours ago \| parent \| prev [-]
		Yeah, flash is crazy fast, but I've found performance variable.

▲

tmaly 2 hours ago | parent | prev | next [-]

This reminds me of the Peter / Boris comments on writing loops to keep the agents busy.

▲

behnamoh 25 minutes ago | parent | prev | next [-]

Same. How can DeepSeek serve the V4-Pro at such high speeds despite the sanction?

▲

2 hours ago | parent | prev | next [-]

[deleted]

▲

2 hours ago | parent | prev [-]

[deleted]

▲

binyu 21 minutes ago | parent | prev | next [-]

> Right now Claude is faster than me on some tasks but we’re at least close.

I dont doubt it, but I don't think you can spawn 10 copies of yourself working simultaneously.

▲

AlecSchueler 15 minutes ago | parent [-]

No, but nor can you keep track of what 10 agents are doing simultaneously. Hence the multitasking regret.

	▲	pixel_popping 12 minutes ago \| parent [-]
		An agent can, you don't need to watch tasks, you can have a live digest with another tool.

▲

skybrian 34 minutes ago | parent | prev | next [-]

If we get low enough latency, there's no reason to multitask. You can ask it to do one thing at a time and immediately see what it did. That's a nice way to work!

This is normal interactive UI for tasks that aren't compute-intensive. Programs spend most of their time idle, waiting for us to click a button. We shouldn't be waiting for them or spinning more plates to keep them busy.

However, a faster llm isn't enough. You also need fast compiles and fast tests.

▲

efromvt an hour ago | parent | prev | next [-]

I'd be very curious about the bottleneck breakdown in most current software dev - I suspect inference is far from the bottleneck in most things I do, though driving it to 0 would still be nice. I do agree that if it was 0 we'd probably change development approaches to reduce the new bottlenecks more, but it'll take full-process innovation to really get something near-instant.

(I should go measure this now, I'm curious)

▲

UncleOxidant 17 minutes ago | parent | prev | next [-]

Have you tried Gemini 3.5 Flash? It's quite fast. Amazing how fast it finishes tasks.

▲

pianopatrick 2 hours ago | parent | prev | next [-]

We fit in for the things that are not artificial.

So long as AI lives in server farms, humans will be needed for tasks in the physical world.

It's only if we combine AI with robots that things get really dicey.

▲

fartfeatures 2 hours ago | parent [-]

This is very dystopian in my opinion. I'm not the arms, legs, sensors and actuators for a machine super intelligence. I wouldn't treat another human as my slave because they aren't as intelligent as I am any more than I would expect to become a slave for a machine. This is our world (for now) and that is why we fit in. Not because we can serve.

▲

throwaway67678 an hour ago | parent | next [-]

Never read Asimov's Multivac novels? Admittedly not all of them are stellar examples of a future to follow

▲

davedx 2 hours ago | parent | prev | next [-]

Agree

https://en.wikipedia.org/wiki/I_Have_No_Mouth,_and_I_Must_Sc...

	▲	ionwake 23 minutes ago \| parent \| next [-]
		"It seeks revenge on humanity for its own creation." This is brilliant as it reminded me of a famous hitchikers quote: "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move. — From The Restaurant at the End of the Universe (Book 2)" Maybe we are stuck in an eternal loop
	▲	fartfeatures an hour ago \| parent \| prev [-]
		Sounds like snuff porn, not my sort of thing but thanks though.

▲

cicko 2 hours ago | parent | prev [-]

"This is our world" sounds a bit exclusive towards other living and sentient beings on this planet.

▲

ipkstef 2 hours ago | parent | prev | next [-]

asking for curiosities sake. What kind of PR loop are you running that takes a few hours?

▲

ketzo 2 hours ago | parent | next [-]

not OP but usually for me this means long verification loop; waiting 10min on CI checks, that kind of thing, rather than actual 1hr wall clock of token generation

▲

RussianCow 2 hours ago | parent | next [-]

But those things won't be sped up by a faster LLM, so I feel like that's not what the OP is talking about.

	▲	goyozi 2 hours ago \| parent [-]
		Well, I used an extreme example. OTOH, I’ve done quite a few of those „fix CI” or „migrate X” prompts recently and while there is a fixed component like running CI / builds, I’d say the LLM time is still around or above 50%, especially at the beginning of the project. Then there’s also regular tasks that now take minutes per message which completely get me out of the zone. I imagine iterating on those in near real time would be a big change.

▲

devmor 2 hours ago | parent | prev [-]

Or slow MCP servers that are waiting on HTTP calls from APIs, playwright/other UI instrumentation, etc.

▲

goyozi 2 hours ago | parent | prev [-]

I’m rewriting our integration test suite to run tests in parallel. I have the changes split across 7 branches, and each needs to be fixed to have no flaky tests. I told it I want 3 consecutive CI runs with no flakes and no artificial fixes / assert removals etc. We’ll see what comes out; it’s almost a side project so there’s not much to lose other than some of my weekly limit that resets soon.

▲

HarHarVeryFunny 2 hours ago | parent | prev | next [-]

I don't see many companies being willing to pay 3x more for faster code generation. Cloud-based AI code generation is already extremely fast, and hardly the bottleneck for most software product development.

There can't be many normal use cases where there'd be any cost benefit.

▲

fragmede an hour ago | parent [-]

The "traditional" way we vibe code is human software developer prompts AI -> AI generates code -> (human checks code) -> code gets compiled/deployed/etx -> users use "binary". At the speed of 1000 tok/sec, user prompts obliquely -> AI vets generated code -> code deployed -> user gets response from deployed code.

It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias.

	▲	HarHarVeryFunny an hour ago \| parent [-]
		1000 tokens per sec is still massively slower than serving a normal web page - if something doesn't respond in a few seconds many people give up. I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast? I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost.

▲

ilaksh an hour ago | parent | prev | next [-]

Use Claude fast mode and turn off thinking. Tell it to just explain what it's plan is to you at a high level.

It will go much faster.

▲

recroad 2 hours ago | parent | prev [-]

Woah - what’s the prompt and what’s the PR?

	▲	goyozi 2 hours ago \| parent [-]
		I replied in more detail under another comment. TLDR: fixing flaky CI across multiple branches