I have used Gemeni and OpenAI models too but at this point - Sonnet is next level undisputed King.

I was able to port a legacy thermal printer user mode driver from legacy convoluted JS to pure modern Typescript in two to three days at the end of which printer did work.

Same caveats apply - I have decent understanding of both languages specifically various legacy JavaScript patterns for modularity to emulate other language features that don't exist in JavaScript such as classes etc.

▲

piskov 5 days ago | parent [-]

Check swe-bench results but for C#.

It’s literally pathetic how these things just memorize, not achieve any actual problem-solving

https://arxiv.org/html/2506.12286v3

▲

antonvs 5 days ago | parent [-]

You've misunderstood the study that you linked. LLMs certainly memorize, and this can certainly skew benchmarks, but that's not all they do.

Anyone with experience with LLMs will have experienced their actual problem solving ability, which is often impressive.

You'd be better off learning to use them, than speculating without basis about why they won't work.

▲

piskov 5 days ago | parent [-]

What exactly did I misunderstand?

Also “learn to use them” feels you’re holding it wrong vibes.

	▲	wg0 5 days ago \| parent \| next [-]
		You did not misunderstand anything. Sure, LLMs have no cognitive abilities. So even with widely used languages, they do hit the wall and need lots of hand holding.
	▲	antonvs 4 days ago \| parent \| prev [-]
		The study doesn't show that "these things just memorize, not achieve any actual problem-solving." Re learning to use them, I'm more suggesting that you should actually try to use them, because if you believe that they don't "achieve any actual problem-solving," you clearly haven't done so. There are plenty of reports in this thread alone about how people are using them to solve problems. For coding applications, most of us are working on proprietary code that the LLMs haven't been trained on, yet they're able to exhibit strong functional understanding of large, unfamiliar codebases, and they can correctly solve many problems that they're asked to solve. The illusion of thinking paper you linked seems to imply another misunderstanding on your part. All that's pointing out is a fact that's fairly obvious to anyone paying attention: if you use a text generation model to generate the text of supposed "thoughts", those aren't necessarily going to reflect the model's internal functioning. Functionally, the models can clearly understand almost arbitrary domains and solve problems within them. If you want to claim that's not "thinking", that's really just semantics, and doesn't really matter except philosophically. The point is their functional capabilities.