Remix.run Logo
ptnpzwqd 13 hours ago

I have used Claude (incl. Opus 4.6) fairly extensively, and Claude still spits out quality that is far below what I would call production ready - both littered with smaller issues, but also the occasional larger blunder. Particularly when doing anything non-trivial, and even when guiding it in detail (although that admittedly reduces the amount of larger structural issues).

Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).

KellyCriterion 11 hours ago | parent | next [-]

> C#/.NET

same here :)

> one-shots a larger/complex C# feature

I can show you a timeseries data-renderer which was created with 1 initial very large prompt and then 3 following "change this and that" prompts. The file is around 5000 lines and everything works fine & exactly as specified.

allajfjwbwkwja 5 hours ago | parent | next [-]

> The file is around 5000 lines

Yep, this is another case of different standards for "production ready."

KellyCriterion 2 hours ago | parent [-]

Caught, good one! :-))

++1

ptnpzwqd 11 hours ago | parent | prev [-]

Feel free to share it, would be very curious - ideally alongside the prompts.

KellyCriterion 5 hours ago | parent [-]

Do you have an email address?

ptnpzwqd 2 hours ago | parent [-]

You can use this: hnthrowaway.outboard407@passmail.net

skeledrew 8 hours ago | parent | prev | next [-]

I don't get it though. Why do you expect perfect responses? Humans continually make mistakes, and AI is trained on human data. Yet there seems to be this higher bar of expectation for the latter. Somehow people expect this thing that's been around for a few weeks/months, and cannot learn anything more beyond its training cutoff date, to always do a better job than a human who's been around for 20+ years and is able to learn on their own until death.

ptnpzwqd 8 hours ago | parent [-]

I don't expect that - am merely responding to the parent comments claim that Claude consistently one-shots production ready code (which does not at all match my observations).

peteforde 12 hours ago | parent | prev | next [-]

I see this over and over again. I don't dispute your experience. My experience with ESP32 development has been unreasonably positive. My codebase is sitting around 600k LoC and is the product of several hundred Opus 4.x Plan -> Agent -> Debug loops. I review everything that goes through, but I'm reviewing the business logic and domain gotchas, not dumb crap like what you and so many others describe.

What is so strange to me is that surely there is more C# out there than ESP-IDF code? I don't have a good explanation beyond saying that my codebase is extensively tested and used; I would know very quickly if it suddenly started shitting the bed in the way you explain.

whaleidk 7 hours ago | parent | next [-]

600k lines of code for anything on the ESP32 sounds like the absolute polar opposite of “good”

ivan_gammel 12 hours ago | parent | prev | next [-]

The more code is out there, the worse is the average in the training dataset. There will be legacy approaches and APIs, poor design choices, popular use cases irrelevant for your context etc that increase the chances of output not matching your expectations. In Java world this is exactly how it works. I need 3-5 iterations with Claude to get things done the way I expect, sometimes jumping straight to manual refactoring and then returning the result to Claude for review and learning. My CLAUDE.md (multiple of them) are growing big with all patterns and anti-patterns identified this way. To overcome this problem model needs specialized training, that I don‘t think the industry knows how to approach (it has to beat the effort put in the education system for humans).

mjdiloreto 9 hours ago | parent | next [-]

I also believe this must be true. Try asking Claude to program in Forth, I find the results to be unreasonably good. That's probably because most of the available Forth to train on is high quality.

re-thc 10 hours ago | parent | prev [-]

> To overcome this problem model needs specialized training, that I don‘t think the industry knows how to approach

We already have coding tuned models i.e. Codex. We should just have language / technology specific models with a focus on recent / modern usage.

Problem with something like Java is too old -- too many variants. Make a cut off like at least above Java 8 or 17.

xienze 9 hours ago | parent | prev [-]

> My experience with ESP32 development has been unreasonably positive. My codebase is sitting around 600k LoC and is the product of several hundred Opus 4.x Plan -> Agent -> Debug loops.

I feel like this is an example of people having different standards of what “good” code is and hence the differing opinions of how good these tools are. I’m not an embedded developer but 600K LOC seems like a lot in that context, doesn’t it? Again I could be way off base here but that sounds like there must be a lot of spaghetti and copy-paste all over the codebase for it to end up that large.

surajrmal 7 hours ago | parent [-]

I don't think it's that large. Keep in mind embedded projects take few if any dependencies. The standard library in most languages is far bigger than 600k loc.

whaleidk 7 hours ago | parent | next [-]

I work with ESP32 devices and 600k lines of code is insane.

the__alchemist 4 hours ago | parent | prev [-]

I'm curious: What does this device do?

je42 13 hours ago | parent | prev | next [-]

Interesting - what kind of structural issues have you encountered?

Is these more related to the existing source code or is this a bad pattern thar you would never do regardless of the existing code?

huflungdung 12 hours ago | parent | prev [-]

[dead]