It depends on how easily testable the Excel is. If Claude has the ability to run both the Excel and the Python with different inputs, and check the outputs, it's stunningly likely to be able to one-shot it.

▲

AlotOfReading 8 hours ago | parent | next [-]

Something being simultaneously described as a "30 sheet, mind-numbingly complex Excel model" and "testable" seems somewhat unlikely, even before we get into whether Claude will be able to test such a thing before it runs into context length issues. I've seen Claude hallucinate running test suites before.

▲

martinald 7 hours ago | parent [-]

It compacted at least twice but continued with no real issues.

Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.

▲

moregrist 7 hours ago | parent | next [-]

I think the skepticism here is that without tests or a _lot_ of manual QA how would you know that it did it correctly?

Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.

Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.

▲

skybrian 3 hours ago | parent [-]

If you’re porting some formulas from one language to another, “correct” can be defined as “gets the same answers as before.” Assuming you can run both easily, this is easy to write a property test for.

Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.

	▲	gregoryl an hour ago \| parent [-]
		For starters, Python uses IEEE 754, and Excel uses IEEE 754 (with caveats). I wonder if that's being emulated.

▲

stavros 7 hours ago | parent | prev [-]

I generally agree with you, but I tried to get it to modernize a fairly old SaaS codebase, and it couldn't. It had all the code right there, all it had to do was change a few lines, upgrade a few libraries, etc, but it kept getting lots of things wrong. The HTML was wrong, the CSS was completely missing, basic views wouldn't work, things like that.

I have no idea why it had so much trouble with this generally easy task. Bizarre.

▲

rk06 5 hours ago | parent | prev | next [-]

where exactly have you seen excel forumalas to have tests?

I have, in my early careers, gone knee deep into Excel macros and worked on c# automation that will create excel sheet run excel macros on it and then save it without the macros.

in the entire process, I saw dozens of date time mistakes in VBA code, but no tests that would catch them...

▲

datsci_est_2015 6 hours ago | parent | prev | next [-]

And also - who understands the system now? Does anyone know Python at this shop? Is it someone’s implicit duty to now learn Python, or is the LLM now the de facto interface for modifying the system?

When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?

▲

martinald 8 hours ago | parent | prev | next [-]

That's exactly what it did (author here).

▲

majormajor 8 hours ago | parent [-]

I'm having trouble reconciling "30 sheet mind numbingly complicated Excel financial model" and "Two or three prompts got it there, using plan mode to figure out the structure of the Excel sheet, then prompting to implement it. It even added unit tests to the Python model itself, which I was impressed with!"

"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.

And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?

▲

martinald 8 hours ago | parent [-]

No it didn't make a giant plan of every detail. It made a plan of the core concepts and then when it was in implementation mode it kept checking the excel file to get more info. It took around ~30 mins in implementation mode to build it.

I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.

	▲	majormajor 7 hours ago \| parent [-]
		Ah, I see. Did it build a test suite for the Excel side? A fuzzer or such? It's the cross-concern interactions that still get me. 80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).

▲

catlifeonmars 4 hours ago | parent | prev [-]

You touched on Kolmogorov complexity there :)