| ▲ | zarzavat 7 hours ago | |||||||||||||||||||||||||||||||||||||
These threads are always full of superstitious nonsense. Had a bad week at the AIs? Someone at Anthropic must have nerfed the model! The roulette wheel isn't rigged, sometimes you're just unlucky. Try another spin, maybe you'll do better. Or just write your own code. | ||||||||||||||||||||||||||||||||||||||
| ▲ | 2001zhaozhao 5 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
Start vibe-coding -> the model does wonders -> the codebase grows with low code quality -> the spaghetti code builds up to the point where the model stops working -> attempts to fix the codebase with AI actually make it worse -> complain online "model is nerfed" | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | unshavedyak 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Part of me wonders if there's some subtle behavioral change with it too. Early on we're distrusting of a model and so we're blown away, we were giving it more details to compensate for assumed inability, but the model outperformed our expectations. Weeks later we're more aligned with its capabilities and so we become lazy. The model is very good, why do we have to put in as much work to provide specifics, specs, ACs, etc. So then of course the quality slides because we assumed it's capabilities somehow absolved the need for the same detailed guardrails (spec, ACs, etc) for the LLM. This scenario obviously does not apply to folks who run their own benches with the same inputs between models. I'm just discussing a possible and unintentional human behavioral bias. Even if this isn't the root cause, humans are really bad at perceiving reality. Like, really really bad. LLMs are also really difficult to objectively measure. I'm sure the coupling of these two facts play a part, possibly significant, in our perception of LLM quality over time. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | andai 22 minutes ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
They don't nerf the model, just lower the default reasoning effort, encourage shorter responses in the system prompt, etc. Totally different ;) | ||||||||||||||||||||||||||||||||||||||
| ▲ | delbronski 6 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Nah dude, that roulette wheel is 100% rigged. From top to bottom. No doubt about that. If you think they are playing fair you are either brand new to this industry, or a masochist. | ||||||||||||||||||||||||||||||||||||||
| ▲ | 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||||||||
| ▲ | portly 2 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Good to remind this. But I also don't want to go back to pre-llm. Some dev activities are just too painful and boring, like correctly writing s3 policies. We must have discipline to decide what is worth our attention and what we should automate, because there is only so much mind energy we can spend each day. | ||||||||||||||||||||||||||||||||||||||
| ▲ | awwaiid 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It's also difficult to recognize that when it got it right THAT might have been the lucky week. | ||||||||||||||||||||||||||||||||||||||
| ▲ | lnenad 5 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I mean they literally said on their own end that adaptive thinking isn't working as it should. They rolled it out silently, enabled by default, and haven't rolled it back. | ||||||||||||||||||||||||||||||||||||||
| ▲ | colordrops an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
Sorry but this is a ridiculous comment. It's not magic. There are countless levers that can be changed and ARE changed to affect quality and cost, and it's known that compute is scarce. We aren't superstitious, you are just ignorant. | ||||||||||||||||||||||||||||||||||||||
| ▲ | dakolli 5 hours ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
Its because llm companies are literally building quasi slot machines, their UI interfaces support this notion, for instance you can run a multiplier on your output x3,x4,5, Like a slot machine. Brain fried llm users are behaving like gamblers more and more everyday (its working). They have all sorts of theories why one model is better than another, like a gambler does about a certain blackjack table or slot machine, it makes sense in their head but makes no sense on paper. Don't use these technologies if you can't recognize this, like a person shouldn't gamble unless they understand concretely the house has a statistical edge and you will lose if you play long enough. You will lose if you play with llms long enough too, they are also statistical machines like casino games. This stuff is bad for your brain for a lot of people, if not all. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||