Remix.run Logo
monkaiju 9 hours ago

Hmmm, I'm a bit confused of their conclusions (encouraging use) given some of the really damning caveats they point out. A tool they themselves determine to need such careful oversight probably just shouldn't be used near prod at all.

gghffguhvc 8 hours ago | parent | next [-]

For the same quality and quantity output, if the cost of using LLMs + the cost of careful oversight is less than the cost of not using LLMs then the rational choice is to use them.

Naturally this doesn’t factor in things like human obsolescence, motivation and self-worth.

ahepp 8 hours ago | parent | next [-]

It seems like this would be a really interesting field to research. Does AI assisted coding result in fewer bugs, or more bugs, vs an unassisted human?

I've been thinking about this as I do AoC with Copilot enabled. It's been nice for those "hmm how do I do that in $LANGUAGE again?" moments, but it's also wrote some nice looking snippets that don't do quite what I want it to. And many cases of "hmmm... that would work, but it would read the entire file twice for no reason".

My guess, however, is that it's a net gain for quality and productivity. Humans make bugs too and there need to be processes in place to discover and remediate those regardless.

sunshowers 7 hours ago | parent [-]

I'm not sure about research, but I've used LLMs for a few things here at Oxide with (what I hope is) appropriate judgment.

I'm currently trying out using Opus 4.5 to take care of a gnarly code reorganization that would take a human most of a week to do -- I spent a day writing a spec (by hand, with some editing advice from Claude Code), having it reviewed as a document for humans by humans, and feeding it into Opus 4.5 on some test cases. It seems to work well. The spec is, of course, in the form of an RFD, which I hope to make public soon.

I like to think of the spec is basically an extremely advanced sed script described in ~1000 English words.

AlexCoventry 5 hours ago | parent [-]

Maybe it's not as necessary with a codebase as well-organized as Oxide's, but I found gemini 3 useful for a refactor of some completely test-free ML research code, recently. I got it to generate a test case which would exercise all the code subject to refactoring, got it to do the refactoring and verify that it leads to exactly the same state, then finally got it to randomize the test inputs and keep repeating the comparison.

zihotki 8 hours ago | parent | prev [-]

And it doesn't factor seniority/experience. What's good for a senior developer is not necessarily same for a beginner

sudomateo 7 hours ago | parent | prev | next [-]

Medication is littered with warning labels but humans still use it to combat illness. Social media can harm mental health yet people still use it. Pick whatever other example you'd like.

There are things in life that have high risks of harm if misused yet people still use them because there are great benefits when carefully used. Being aware of the risks is the key to using something that can be harmful, safely.

ares623 8 hours ago | parent | prev | next [-]

I would think some of their engineers love using LLMs, it would be unfair to them to completely disallow it IMO (even as someone who hates LLMs)

mathgeek 8 hours ago | parent | prev | next [-]

Junior engineers are the usual comparison folks make to LLMs, which is apt as juniors need lots of oversight.

rgoulter 8 hours ago | parent | prev | next [-]

What do you find confusing about the document encouraging use of LLMs?

The document includes statements like "LLMs are superlative at reading comprehension", "LLMs can be excellent editors", "LLMs are amazingly good at writing code".

The caveats are really useful: if you've anchored your expectations on "these tools are amazing", the caveats bring you closer to what they've observed.

Or, if you're anchored on "the tools aren't to be used", the caveats give credibility to the document's suggestions of the LLMs are useful for.

saagarjha 7 hours ago | parent | prev | next [-]

There’s a lot of code that doesn’t hit prod.

devmor 8 hours ago | parent | prev [-]

The ultimate conclusion seems to be one that leaves it to personal responsibility - the user of the LLM is responsible for ensuring the LLM has done its job correctly. While this is the ethical conclusion to me, but the “gap” left to personal responsibility is so large that it makes me question how useful everything else in this document really is.

I don’t think it is easy to create a concise set of rules to apply in this gap for something as general as LLM use, but I do think such a ruleset is noticeably absent here.