I have found the LLMs to be wrong in random insidious ways, so trusting them with anything critical is terrifying.

Recent (as in last few days/weeks) incidents using different models/tools:

* Google AI search summary compare product A & B, call out a bunch of differences that are correct.. and then threw in features that didn't exist

* Work (midsize company with big AI team / homebuilt GPT wrappers) PDF parsing for company headquarters address, it hallucinated an address that didn't exist in the document

* Work, a team using frontier model from top 2 AI lab was using it to perform DevOps type tasks, requested "Restart XYZ service in DEV environment". It responded "OK, restarting ABC service in PROD environment". It then asked for confirmation AFTER actioning whether they meant XYZ in DEV or ABC in PROD... a little too late.

They are very difficult tools to use correctly when the results are not automatically verifiable (like code can be with the right tests) and the answer might actually matter.

▲

rsynnott 5 hours ago | parent [-]

> Work, a team using frontier model from top 2 AI lab was using it to perform DevOps type tasks, requested "Restart XYZ service in DEV environment". It responded "OK, restarting ABC service in PROD environment". It then asked for confirmation AFTER actioning whether they meant XYZ in DEV or ABC in PROD... a little too late.

... Wait, they gave the magic robot _access to modify their production environment_?!

Bloody hell, there's no helping some people.

	▲	steveBK123 5 hours ago \| parent [-]
		Yes, at a fairly large company that should otherwise know better. The problem with all these orgs hiring "AI experts" is the adverse selection of finding the people who "know AI" but can't get a job at AI lab, startup, big tech, or literally any other job using AI that is better than "making excel do AI more good". It's like Big Data / Cybersecurity / DevOps / Big Agile / Cloud Evangelist / Data Science grifter playbook all over again.