Remix.run Logo
flexagoon 2 hours ago

I'm using Deepseek-v4-pro as my main model and this is sometimes pretty annoying, I have to do some easy boring task, think "I'll just leave the agent to do it and go take a nap", but it's already done writing the code before I even walk away from the computer

throwaway67678 an hour ago | parent | next [-]

Agent mania setting in

It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour

throw1234567891 2 minutes ago | parent | next [-]

It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway.

smith7018 an hour ago | parent | prev [-]

I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well.

leodavi 34 minutes ago | parent | next [-]

I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs.

Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates.

I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs.

dizhn 15 minutes ago | parent | prev | next [-]

All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken.

AgentMasterRace 25 minutes ago | parent | prev [-]

All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human.

RussianCow 2 hours ago | parent | prev | next [-]

Do you mean Flash and not Pro? I haven't tried it personally, but according to OpenRouter, the fastest DeekSeep V4 Pro providers are only ~50tps. That's slower than Claude Opus.

https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp...

sarjann an hour ago | parent | next [-]

I don't think token speed matters as much when a lot of tokens are needed to achieve a task. E.g. artificial analysis benchmarks where deepseek v4 is one of the biggest token burners to go through the benchmark.

specproc 2 hours ago | parent | prev [-]

Yeah, flash is crazy fast, but I've found performance variable.

behnamoh 22 minutes ago | parent | prev | next [-]

Same. How can DeepSeek serve the V4-Pro at such high speeds despite the sanction?

tmaly 2 hours ago | parent | prev | next [-]

This reminds me of the Peter / Boris comments on writing loops to keep the agents busy.

2 hours ago | parent | prev | next [-]
[deleted]
2 hours ago | parent | prev [-]
[deleted]