| ▲ | flexagoon 2 hours ago |
| I'm using Deepseek-v4-pro as my main model and this is sometimes pretty annoying, I have to do some easy boring task, think "I'll just leave the agent to do it and go take a nap", but it's already done writing the code before I even walk away from the computer |
|
| ▲ | throwaway67678 an hour ago | parent | next [-] |
| Agent mania setting in It's also pretty funny sometimes how it gives weird future roadmap estimates ("part 2 - 3 weeks, part 3 - 2 months", etc.) and when you tell it to actually do those changes it's pretty much done in half an hour |
| |
| ▲ | throw1234567891 2 minutes ago | parent | next [-] | | It repeats what it has seen in the training data. Expecting it to reason about the complexity of a task is a pipe dream. The best is to tell it not to come back with estimates, and when it does, remove them anyway. | |
| ▲ | smith7018 an hour ago | parent | prev [-] | | I've long believed those numbers were faked by Anthropic/OpenAI to serve as a form of advertisement. The estimates are impossible to verify and their ability to do "2 days of work" in 10 minutes will presumably make the user go "Wow, I just saved SO much time!" Plus, the unnecessary text eats up the users' tokens so it helps the companies on the backend, as well. | | |
| ▲ | leodavi 34 minutes ago | parent | next [-] | | I agree with you that labs are benefiting from those outputs but I'm skeptical that labs are purposefully training the models to produce those outputs. Raw pre-training data includes plenty of conversations between professional builders and some of those include estimates. I believe the outputs are a training coincidence with consequences that are opportunitistic for the labs. | |
| ▲ | dizhn 15 minutes ago | parent | prev | next [-] | | All models do it. It's their training. They didn't have "a person does this in a week but an LLM could in a minute" in their training yet. They also don't have the concept of elapsed time unless you ask them how long something has taken. | |
| ▲ | AgentMasterRace 25 minutes ago | parent | prev [-] | | All the models have broken estimates. They're trained heavily on jira and GitHub tasks and issues, that's why their estimates are human. |
|
|
|
| ▲ | RussianCow 2 hours ago | parent | prev | next [-] |
| Do you mean Flash and not Pro? I haven't tried it personally, but according to OpenRouter, the fastest DeekSeep V4 Pro providers are only ~50tps. That's slower than Claude Opus. https://openrouter.ai/deepseek/deepseek-v4-pro?sort=throughp... |
| |
| ▲ | sarjann an hour ago | parent | next [-] | | I don't think token speed matters as much when a lot of tokens are needed to achieve a task. E.g. artificial analysis benchmarks where deepseek v4 is one of the biggest token burners to go through the benchmark. | |
| ▲ | specproc 2 hours ago | parent | prev [-] | | Yeah, flash is crazy fast, but I've found performance variable. |
|
|
| ▲ | behnamoh 22 minutes ago | parent | prev | next [-] |
| Same. How can DeepSeek serve the V4-Pro at such high speeds despite the sanction? |
|
| ▲ | tmaly 2 hours ago | parent | prev | next [-] |
| This reminds me of the Peter / Boris comments on writing loops to keep the agents busy. |
|
| ▲ | 2 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | 2 hours ago | parent | prev [-] |
| [deleted] |