If a calculator gives me 5 when I do 2+2, I throw it away.

If a PC crashes when I uses more than 20% of its soldered memory, i throw it away.

If a mobile phone refuses to connect to a cellular tower, I get another one.

What I want from my tools is reliability. Which is a spectrum, but LLMs are very much on the lower end.

▲ tokioyoyo 3 hours ago | parent | next [-]

You can have this position, but the reality is that the industry is accepting it and moving forward. Whether you’ll embrace some of it and utilize it to improve your workflow, is up to you. But over-exaggerating the problem to this point is kinda funny.

▲ crazygringo an hour ago | parent | prev | next [-]

Honestly, LLMs are about as reliable as the rest of my tools are.

Just yesterday, AirDrop wouldn't work until I restarted my Mac. Google Drive wouldn't sync properly until I restarted it. And a bug in Screen Sharing file transfer used up 20 GB of RAM to transfer a 40 GB file, which used swap space so my hard drive ran out of space.

My regular software breaks constantly. All the time. It's a rare day where everything works as it should.

LLMs have certainly gotten to the point where they seem about as reliable as the rest of the tools I use. I've never seen it say 2+2=5. I'm not going to use it for complicated arithmetic, but that's not what it's for. I'm also not going to ask my calculator to write code for me.

▲ candiddevmike an hour ago | parent | prev | next [-]

Sorry you're being downvoted even though you're 100% correct. There are use cases where the poor LLM reliability is as good or better than the alternatives (like search/summarization), but arguing over whether LLMs are reliable is silly. And if you need reliability (or even consistency, maybe) for your use case, LLMs are not the right tool.

▲ fennecfoxy 3 hours ago | parent | prev | next [-]

Except it's more a case of "my phone won't teleport me to Hawaii sad faec lemme throw it out" than anything else.

There are plenty of people manufacturing their expectations around the capabilities of LLMs inside their heads for some reason. Sure there's marketing; but for individuals susceptible to marketing without engaging some neurons and fact checking, there's already not much hope.

Imagine refusing to drive a car in the 60s because they haven't reach 1kbhp yet. Ahaha.

	▲	skydhash 2 hours ago \| parent [-]
		> Imagine refusing to drive a car in the 60s because they haven't reach 1kbhp yet. Ahaha. That’s very much a false analogy. In the 60s, cars were very reliable (not as much as today’s cars) but it was already an established transportation vehicle. 60s cars are much closer to todays cars than 2000s computers are to current ones.

▲ embedding-shape 3 hours ago | parent | prev [-]

> What I want from my tools is reliability. Which is a spectrum, but LLMs are very much on the lower end.

"reliability" can mean multiple things though. LLM invocations are as reliable (granted you know how program properly) as any other software invocation, if you're seeing crashes you're doing something wrong.

But what you're really talking about is "correctness" I think, in the actual text that's been responded with. And if you're expecting/waiting for that to be 100% "accurate" every time, then yeah, that's not a use case for LLMs, and I don't think anyone is arguing for jamming LLMs in there even today.

Where the LLMs are useful, is where there is no 100% "right or wrong" answer, think summarization, categorization, tagging and so on.

▲ skydhash 3 hours ago | parent [-]

I’m not a native English speaker so I checked on the definition of reliability

  the quality of being able to be trusted or believed because of working or behaving well

For a tool, I expect “well” to mean that it does what it’s supposed to do. My linter are reliable when it catches bad patterns I wanted it to catch. My editor is reliable when I can edit code with it and the commands do what they’re supposed to do.

So for generating text, LLMs are very reliable. And they do a decent job at categorizing too. But code is formal language, which means correctness is the end result. A program may be valid and incorrect at the same time.

It’s very easy to write valid code. You only need the grammar of the language. Writing correct code is another matter and the only one that is relevant. No one hire people for knowing a language grammar and verifying syntax. They hire people to produce correct code (and because few businesses actually want to formally verify it, they hire people that can write code with a minimal amount of bugs and able to eliminate those bugs when they surface).

	▲	cpburns2009 an hour ago \| parent [-]
		I'm a native English speaker. Your understanding and usage of the word "reliability" is correct, and that's the exact word I'd use in this conversation. The GP is playing a pointless semantics game.