| ▲ | waleedlatif1 a day ago |
| I really hope sonnet 4 is not obsessed with tool calls the way 3-7 is. 3-5 was sort of this magical experience where, for the first time, I felt the sense that models were going to master programming. It’s kind of been downhill from there. |
|
| ▲ | lherron a day ago | parent | next [-] |
| Overly aggressive “let me do one more thing while I’m here” in 3.7 really turned me off as well. Would love a return to 3.5’s adherence. |
| |
| ▲ | mkrd a day ago | parent | next [-] | | Yes, this is pretty annoying. You give it a file and want it to make a small focused change, but instead it almost touches every line of code, even the unrelated ones | |
| ▲ | artdigital 15 hours ago | parent | prev | next [-] | | Oh jeez yes. I completely forgot that this was a thing. It’s tendency to do completely different things “while at it” was ridiculous | |
| ▲ | nomel a day ago | parent | prev | next [-] | | I think there was definitely a compromise with 3.7. When I turn off thinking, it seems to perform very poorly compared to 3.5. | |
| ▲ | Jabrov 14 hours ago | parent | prev [-] | | Can be solved with more specific prompting |
|
|
| ▲ | deciduously a day ago | parent | prev [-] |
| This feels like more of a system prompt issue than a model issue? |
| |
| ▲ | waleedlatif1 a day ago | parent | next [-] | | imo, model regression might actually stem from more aggressive use of toolformer-style prompting, or even RLHF tuning optimizing for obedience over initiative. i bet if you ran comparable tasks across 3-5, 3-7, and 4-0 with consistent prompts and tool access disabled, the underlying model capabilities might be closer than it seems. | |
| ▲ | ketzo a day ago | parent | prev | next [-] | | Anecdotal of course, but I feel a very distinct difference between 3.5 and 3.7 when swapping between them in Cursor’s Agent mode (so the system prompt stays consistent). | |
| ▲ | a day ago | parent | prev [-] | | [deleted] |
|