| |
| ▲ | moregrist 7 hours ago | parent | next [-] | | I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly? Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that. Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue. | | |
| ▲ | skybrian 3 hours ago | parent [-] | | If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for. Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with. | | |
| ▲ | gregoryl an hour ago | parent [-] | | For starters, Python uses IEEE 754, and Excel uses IEEE 754 (with caveats). I wonder if that's being emulated. |
|
| |
| ▲ | stavros 7 hours ago | parent | prev [-] | | I generally agree with you, but I tried to get it to modernize a fairly old SaaS codebase, and it couldn't. It had all the code right there, all it had to do was change a few lines, upgrade a few libraries, etc, but it kept getting lots of things wrong. The HTML was wrong, the CSS was completely missing, basic views wouldn't work, things like that. I have no idea why it had so much trouble with this generally easy task. Bizarre. |
|