> What I want from my tools is reliability. Which is a spectrum, but LLMs are very much on the lower end.

"reliability" can mean multiple things though. LLM invocations are as reliable (granted you know how program properly) as any other software invocation, if you're seeing crashes you're doing something wrong.

But what you're really talking about is "correctness" I think, in the actual text that's been responded with. And if you're expecting/waiting for that to be 100% "accurate" every time, then yeah, that's not a use case for LLMs, and I don't think anyone is arguing for jamming LLMs in there even today.

Where the LLMs are useful, is where there is no 100% "right or wrong" answer, think summarization, categorization, tagging and so on.

▲ skydhash 3 hours ago | parent [-]

I’m not a native English speaker so I checked on the definition of reliability

  the quality of being able to be trusted or believed because of working or behaving well

For a tool, I expect “well” to mean that it does what it’s supposed to do. My linter are reliable when it catches bad patterns I wanted it to catch. My editor is reliable when I can edit code with it and the commands do what they’re supposed to do.

So for generating text, LLMs are very reliable. And they do a decent job at categorizing too. But code is formal language, which means correctness is the end result. A program may be valid and incorrect at the same time.

It’s very easy to write valid code. You only need the grammar of the language. Writing correct code is another matter and the only one that is relevant. No one hire people for knowing a language grammar and verifying syntax. They hire people to produce correct code (and because few businesses actually want to formally verify it, they hire people that can write code with a minimal amount of bugs and able to eliminate those bugs when they surface).

	▲	cpburns2009 an hour ago \| parent [-]
		I'm a native English speaker. Your understanding and usage of the word "reliability" is correct, and that's the exact word I'd use in this conversation. The GP is playing a pointless semantics game.