Remix.run Logo
raw_anon_1111 2 hours ago

> My stance has been pretty rigid for some time: LLMs hallucinate, so they aren’t reliable building blocks. If you can’t rely on the translation step, you can’t treat it as a serious abstraction layer because it provides no stable guarantees about the underlying system.

This is technically true. But unimportant. When I write code in a higher level language and it gets compiled to machine code, ultimately I am testing statically generated code for correctness. I don’t care what type of weird tricks the compiler did for optimizations.

How is that any different than when someone is testing LLM generated C code? I’m still testing C code that isn’t going to magically be changed by the LLM without my intervention anymore than my C code is going to be changed without my recompiling it.

On this latest project I was on, the Python generated code by Codex was “correct” with the happy path. But there were subtle bugs in the distributed locking mechanics and some other concurrency controls I specified. Ironically, those were both caught by throwing the code in ChatGPT in thinking mode.

No one is using an LLM to compute is a number even or odd at runtime.

rileymichael an hour ago | parent | next [-]

> I don’t care what type of weird tricks the compiler did for optimizations.

you might not, but plenty of others do. on the jvm for example, anyone building a performance sensitive application has to care about what the compiler emits + how the jit behaves. simple things like accidental boxing, megamorphic call preventing inlining, etc. have massive effects.

i've spent many hours benchmarking, inspecting in jitwatch, etc.

raw_anon_1111 an hour ago | parent | next [-]

And 95%+ developers aren’t writing performance sensitive code. In my career, most bottlenecks I’ve seen are because of bad database design, network latency, or other infrastructure related issuesor in the cloud days startup latency for anything serviceless.

Yes I know every millisecond a company like Google can shave off, is multiplied by billions of transactions a day and can save real money on infrastructure. But even at a second tier company like Salesforce, it probably doesn’t matter

pjmlp 29 minutes ago | parent | prev [-]

Which is a good example on how managed runtimes are already not deterministic and how hard it is to reproduce scenarios.

raw_anon_1111 3 minutes ago | parent [-]

I agree, in my original comment, I went out of the way to say “C” in my hypothetical argument.

But even with C, it’s still not completely deterministic with out of order and predictive branching, cache hits vs misses etc. Didn’t exactly this cause some of the worse processor level security issues we had seen in years?

skydhash 2 hours ago | parent | prev [-]

Because for all high level languages, errors happen at the same level of the language. You do not write programs in Go and then verify it in opcodes with a dissasembler. Incorrect syntax and runtime reference the Go files and symbols, not CPU registers.

The same thing happens in JavaScript. I debug it using a Javascript debugger, not with gdb. Even when using bash script, you don’t debug it by going into the programs source code, you just consult the man pages.

When using LLM, I would expect not to go and verify the code to see if it actually correct semantically.

raw_anon_1111 2 hours ago | parent [-]

If it works with all of your human or even generated test cases, why do I care if it decided to use a while loop or a for loop?

Like I said above, I do know to watch out for implementations that “Work on my Machine” but don’t work at scale or involve concurrency. But I have had to check for the same issues when I delegate work to more junior developers.

This is not meant to be an insult toward you. But my not doing front end development for well over a decade, a front end developer might as well be a “human LLM” to me. I’m going to give you the business requirements and constraints and you are going to come back with a website. I am just going to check it meets the business requirements and not tell you the how. I’m definitely not going to look at the code.

I just had a web project I had to modify for a new project, I used Codex and didn’t look at a line of code. Yeah I know JavaScript. But I have no idea whether the initial developer who worked on on another project I led or whether the Codex changes were idiomatic. I know the developer and Codex met my functional requirements.