Remix.run Logo
heavyset_go 4 hours ago

No, it is a tooling problem.

The tooling is telling laymen that they built wonderful things that definitely work and perfectly fix and add features.

The tooling gasses them up and is simply wrong in these cases.

If your tool regularly lies, gaslights and produces wrong results, that's a tooling issue.

Den_VR 3 hours ago | parent | next [-]

Does the hammer lie to you that everything is a nail?

Can a voltmeter _lie_ to you?

EE are expected to know when their measurements are wrong. And Professional Engineers are legally accountable for consequences of such mistakes.

possibleworlds 2 hours ago | parent | next [-]

If a hammer had a chat interface that said everything was a nail then the answer would be yes, the hammer lies to you about everything being a nail.

walletdrainer an hour ago | parent [-]

If someone believes a hammer when it tells them such things, they should probably have some sort of a caretaker assigned to help them through life.

mvid 33 minutes ago | parent [-]

If hammer companies were suddenly the most valuable international companies, and spent millions on ad campaigns and lobbying about trusting the hammer interface, then you can assume a large amount of people might trust the hammer interface

walletdrainer 26 minutes ago | parent [-]

Where are the ad campaigns telling me to trust LLMs?

I don’t use an adblocker, do read traditional dead tree newspapers and do get exposed to satellite tv channels.

I don’t think I’ve ever seen anyone anywhere telling me how reliable LLMs are.

Pretty sure this tech sells itself to consumers, enterprise sales are what they’ve always been.

card_zero 22 minutes ago | parent [-]

So now you're pivoting away from the caretaker proposal? I thought it had potential but I don't know how you'd fund it.

walletdrainer 2 minutes ago | parent [-]

> I thought it had potential but I don't know how you'd fund it.

The same way we fund other social services here in Europe. If an individual is incapable of caring for themselves, the state is expected to care for them.

fao_ an hour ago | parent | prev | next [-]

If software engineering wants to progress past being an "art" and be considered an engineering discipline, then it should adopt methods and practices from engineering. First and foremost, one of the universal methodologies is analysis of root cause in faults, and redundancies to avoid that. e.g. the FAA has two pilots for planes, and each system is built in redundantly so if an engineer misses a bolt or rivet, the plane won't crash. intersections are designed such that there is a forcing function[0] on the behaviour of the motorists to prevent fault. Or, to take your tool analogy, nail guns are designed to be pressed against something with a decent amount of pressure before you can fire them.

All of these systems are designed around the core idea of "a human acting irrationally or improperly is not at fault" and, furthermore, that a human can have a bad day and still avoid a mistake. They all steer someone around a possible fault. Hell, the reason why we divide the road into lanes is itself a forcing function to avoid traffic collisions!

So, where is the forcing function in large language models? What part of a large language model prevents gross misuse by laymen?

I can think of examples here and there, maybe. OpenAI had to add guard rails to stop people from poisoning themselves with botulism and boron, etc. But the problem here is that the LLM is probabilistic, so there's really no guarantee that those guard rails will hold. I seem to remember there being a paper from a few months back, posted here, that show AI guardrails cannot be proven to work consistently. In that context, LLMs cannot be considered "safe" or "reliable" enough for use. Eddie Burback has a very, very good video showing an absolute worst case result of this[1], that was posted here last year. Even then, off the top of my head Angela Collier has a really, really good video demonstrating that there's an absolute plethora of people who have succumbed, in large ways or small, to the bullshit AI can spew[2].

I feel like if most developers were actually serious about being an engineering discipline, like we claim, then we wouldn't have all jumped on the LLM bandwagon until they'd been properly tested and had a certain level of reliability. Instead there are a sizable chunk of people saying they've stopped coding by hand entirely, and aren't even reviewing the code! i.e. They've thrown out a forcing function that existed to prevent errorenous PRs being committed! And for some bizzare reason, after about 2 decades of people talking about type safety and how we need formal verification to reduce error, everyone seems to be throwing "reduction of error" out the window!

[0]: https://en.wikipedia.org/wiki/Behavior-shaping_constraint (if you're curious about the term)

[1]: https://www.youtube.com/watch?v=VRjgNgJms3Q

[2]: https://www.youtube.com/watch?v=7pqF90rstZQ

anon7000 32 minutes ago | parent [-]

> I feel like if most developers were actually serious about being an engineering discipline, like we claim, then we wouldn't have all jumped on the LLM bandwagon until they'd been properly tested and had a certain level of reliability

Development can’t be a “serious” engineering discipline because the economics of tech companies doesn’t allow for it. But this has a lot less to do about developers, and significantly more to do with the severe pressure company executives are putting on everyone to use AI, no matter what.

But let’s be honest, many companies have adopted things like root cause analysis and blameless postmortems to deal with infrastructure reliability and reducing incidents. Making systems resilient to human mistakes, making it impossible for the typo to blow up a database, etc. are considered best practices at most places I’ve worked. On the product side, I think it’s absolutely normal to make it hard for a user to take an action that would seriously mess up their account.

The core problem happens when your product idea (say, social media) has vast negative externalities which the company isn’t forced to deal with economically. Whereas in other engineering disciplines, many things are actually safety related and you could get sued over. I’m imagining pretty much anything a structural engineer or electrical engineer works on could seriously hurt or kill someone if a bad enough mistake was made.

That just doesn’t apply to software. There is a lot of “life & death” software, but it’s more niche. The reality is that 90% of what the tech industry works on is not capable of physically harming humans, and it’s not really possible to sue over the potential negative consequences of… a dev tooling startup? It’s a very, very different industry than those other engineering disciplines work in.

But, software engineering has actually been extremely successful at minimizing risk from software defects. The most likely worst software level mistake I could make could… crash my own program. It likely wouldn’t even crash the operating system since it’s isolated. That lack of trust in what other people might do is codified everywhere in software. On an iPhone, I’m downloading apps edited by tens of thousands of other engineers, at essentially no risk to myself at all.

perching_aix an hour ago | parent | prev [-]

> Can a voltmeter _lie_ to you?

Hell fucking yes it can?

bingo-bongo 2 hours ago | parent | prev | next [-]

> ..laymen..

That’s the behavioral problem.

When AI is assisting a professional, the outcome is vastly different.

AnthonBerg 2 hours ago | parent | prev | next [-]

By definition of responsibility it is a behavioral problem.

4 hours ago | parent | prev | next [-]
[deleted]
onion2k 3 hours ago | parent | prev | next [-]

If your tool regularly lies, gaslights and produces wrong results, that's a tooling issue.

It's a human issue if you don't recognise that the code it's generated is wrong. That will never change no matter how good the tooling gets.

kiba 3 hours ago | parent | next [-]

The tooling is the issue because humans designed the tooling wrong. It's a chatbot interface fined tuned to sycophancy. That's not a coincidence.

Hamuko 2 hours ago | parent | prev [-]

Isn't part of the problem that these tools are advertised as allowing non-coders to code? How are you gonna recognise that the code is wrong when you don't know how to code and the product is telling you that you don't even need to?

4 hours ago | parent | prev | next [-]
[deleted]
fatata123 4 hours ago | parent | prev | next [-]

[dead]

teo_zero 2 hours ago | parent | prev [-]

Technical analysis tells you that a stock is in its upwards trend. You invest all your money on it without thinking twice. The price goes down and you lose thousands of dollars. Is it a tool problem?

LLMs spit out a sequence of tokens that is the most probable continuation of the input. LLMs don't lie any more than technical analysis does when it predicts the most likely trend of stock prices. It's up to you how to use this information.

an hour ago | parent [-]
[deleted]