> after using it for months you get a ‘feel’ for what kind of mistakes it makes

Sure, go ahead and bet your entire operation on your intuition of how a non-deterministic, constantly changing black box of software "behaves". Don't see how that could backfire.

▲

sixhobbits 5 hours ago | parent | next [-]

not betting my entire operation - if the only thing stopping a bad 'deploy' command destroying your entire operation is that you don't trust the agent to run it, then you have worse problems than too much trust in agents.

I similarly use my 'intuition' (i.e. evidence-based previous experiences) to decide what people in my team can have access to what services.

	▲	supernes 5 hours ago \| parent [-]
		I'm not saying intuition has no place in decision making, but I do take issue with saying it applies equally to human colleagues and autonomous agents. It would be just as unreliable if people on your team displayed random regressions in their capabilities on a month to month basis.

▲

otabdeveloper4 3 hours ago | parent | prev | next [-]

What, you don't trust the vibes? Are you some sort of luddite?

Anyways, try a point release upgrade of a SOTA model, you're probably holding it wrong.

▲

perching_aix 5 hours ago | parent | prev | next [-]

So like every software? Why do you think there are so many security scanners and whatnot out there?

There are millions of lines of code running on a typical box. Unless you're in embedded, you have no real idea what you're running.

▲

danaris 2 hours ago | parent | next [-]

...No, it's not at all "like every software".

This seems like another instance of a problem I see so, so often in regard to LLMs: people observe the fact that LLMs are fundamentally nondeterministic, in ways that are not possible to truly predict or learn in any long-term way...and they equate that, mistakenly, to the fact that humans, other software, what have you sometimes make mistakes. In ways that are generally understandable, predictable, and remediable.

Just because I don't know what's in every piece of software I'm running doesn't mean it's all equally unreliable, nor that it's unreliable in the same way that LLM output is.

That's like saying just because the weather forecast sometimes gets it wrong, meteorologists are complete bullshit and there's no use in looking at the forecast at all.

	▲	orbital-decay an hour ago \| parent [-]
		>That's like saying just because the weather forecast sometimes gets it wrong, meteorologists are complete bullshit and there's no use in looking at the forecast at all. Are you really not seeing that GP is saying exactly this about LLMs? What you want for this to be practical is verification and low enough error rate. Same as in any human-driven development process.

▲

johnisgood 3 hours ago | parent | prev [-]

[dead]

▲

vanviegen 5 hours ago | parent | prev [-]

> bet your entire operation

What straw man is doing that?

	▲	supernes 5 hours ago \| parent \| next [-]
		Reports of people losing data and other resources due to unintended actions from autonomous agents come out practically every week. I don't think it's dishonest to say that could have catastrophic impact on the product/service they're developing.
	▲	KaiserPro 5 hours ago \| parent \| prev [-]
		looking at the reddit forum, enough people to make interesting forum posts.