Has inaccuracies been an issue for any of the systems you have developed using LLMs? I hear your complaint quite a bit but it does not align with my experience. Definitely one shotting a chatbot around an esoteric problem introduces possible inaccuracies. If I get an LLM to interrogate a pdf or other document that error rate drops significantly and is mostly on the part of the structuring process and not the LLM.

Genuinely curious what others have experienced but specifically those that are using LLMs for business workflows. It is not to say any system is perfect but for purpose driven data pipelines LLMs can be pretty great.

▲

K0nserv 3 days ago | parent | next [-]

Yes I've seen issues with both, but in part what's tricky about false negatives is also that you don't necessarily realise they are there. In the systems I've worked on we've made it simple for operators to verify the work the LLM has done, but this only guards against false positives, which are less problematic.

I've had pretty good success using LLMs for coding and in some ways they are perfect for that. False positives are usually obvious and false negatives don't matter because as long as the LLM finds a solution, it's not a huge deal if there was a better way to do it. Even when the LLM cannot solve the problem at all, it usually produces some useful artifacts for the human to build on.

	▲	infecto 2 days ago \| parent \| next [-]
		That’s fair and I typically have utilized LLM workflows where I believe the current gen of models shine. Classifications, data structuring, summarization, etc.
	▲	birn559 2 days ago \| parent \| prev [-]
		> as long as the LLM finds a solution, it's not a huge deal if there was a better way to do it It might not matter short term, but midterm such debt becomes a huge burden.

▲

Incipient 2 days ago | parent | prev | next [-]

I don't really track issues, as I don't need to. Just a recent example "please extract the tabular data from this visual" and the model had incorrect aligned records in one column, so the IDs were off by 1 in the data.

I'm sure in 95% of cases it gets it right, but it didn't this time, and I'm not sure how to actually work around that fact.

	▲	infecto 2 days ago \| parent [-]
		Not an attack on your experience at all! I would would definitely counter that multimodal are still error prone and much better output is achieved using a tool like textract and then an LLM on the output data.

▲

aaronbaugher 2 days ago | parent | prev [-]

I asked an LLM to guide me through a Salesforce process last week. It gave me step-by-step instructions, about 50% of which were fine while the others referenced options that didn't exist in the system. So I followed the steps until I got to a wrong one, then told it that was wrong, at which point it said that was wrong and gave me different instructions. After a few cycles of that and some trial-and-error, I had a working process.

It probably did save me some time, so I'd call it a mild success; but it didn't save a lot of time, and I only succeeded in the end because I know Salesforce pretty well and was just inexperienced at this one area, so I was able to see where it was probably going off the rails. Someone new to Salesforce would have been hopelessly lost by its advice.

It's understandable that an LLM wouldn't be very good at Salesforce, because there's a lot of bad information in the support forums out there, and the ways of doing things in it have changed multiple times over the years. But that's true of a lot of systems, so it's not an excuse, just a symptom of using LLMs that's probably not going to change.