| ▲ | steveBK123 5 hours ago | |||||||
I have found the LLMs to be wrong in random insidious ways, so trusting them with anything critical is terrifying. Recent (as in last few days/weeks) incidents using different models/tools: * Google AI search summary compare product A & B, call out a bunch of differences that are correct.. and then threw in features that didn't exist * Work (midsize company with big AI team / homebuilt GPT wrappers) PDF parsing for company headquarters address, it hallucinated an address that didn't exist in the document * Work, a team using frontier model from top 2 AI lab was using it to perform DevOps type tasks, requested "Restart XYZ service in DEV environment". It responded "OK, restarting ABC service in PROD environment". It then asked for confirmation AFTER actioning whether they meant XYZ in DEV or ABC in PROD... a little too late. They are very difficult tools to use correctly when the results are not automatically verifiable (like code can be with the right tests) and the answer might actually matter. | ||||||||
| ▲ | rsynnott 5 hours ago | parent [-] | |||||||
> Work, a team using frontier model from top 2 AI lab was using it to perform DevOps type tasks, requested "Restart XYZ service in DEV environment". It responded "OK, restarting ABC service in PROD environment". It then asked for confirmation AFTER actioning whether they meant XYZ in DEV or ABC in PROD... a little too late. ... Wait, they gave the magic robot _access to modify their production environment_?! Bloody hell, there's no helping some people. | ||||||||
| ||||||||