I'm arriving at the conclusion that deployments of LLMs is most suitable in areas where the cost of false positives and, crucially, false negatives are low.

If you cannot tolerate false negatives I don't see how you get around the inaccuracy of LLMs. As long as you can spot false positives and their rate is sufficiently low they are merely an annoyance.

I think this is a good consideration before starting a project leveraging LLMs

▲

jbreckmckye 3 days ago | parent | next [-]

I agree, and it's why I think AI is a good $50 billion industry but not a $5 trillion industry.

▲

simianwords 3 days ago | parent | prev | next [-]

I completely agree. These are useful in fuzzy cases but we live in a fuzzy world. Most things are fuzzy and nothing is completely true or completely false.

If I as a human deploy code, it is not certain that it necessarily works - just like with LLMs. The extent is different however.

	▲	zubiaur 2 days ago \| parent [-]
		100% where we are having a lot of success is in processes that required somewhat repeatable fuzzy processing, which before could only be performed by people. Cool thing is that, since LLMs are comparatively cheap, I can afford to run the same process a few times, to get a sense of confidence of the response. In our latest project, the client expressed that our AI aided process was 11 times faster, and much more accurate than their previous process.

▲

infecto 3 days ago | parent | prev | next [-]

Has inaccuracies been an issue for any of the systems you have developed using LLMs? I hear your complaint quite a bit but it does not align with my experience. Definitely one shotting a chatbot around an esoteric problem introduces possible inaccuracies. If I get an LLM to interrogate a pdf or other document that error rate drops significantly and is mostly on the part of the structuring process and not the LLM.

Genuinely curious what others have experienced but specifically those that are using LLMs for business workflows. It is not to say any system is perfect but for purpose driven data pipelines LLMs can be pretty great.

▲

K0nserv 3 days ago | parent | next [-]

Yes I've seen issues with both, but in part what's tricky about false negatives is also that you don't necessarily realise they are there. In the systems I've worked on we've made it simple for operators to verify the work the LLM has done, but this only guards against false positives, which are less problematic.

I've had pretty good success using LLMs for coding and in some ways they are perfect for that. False positives are usually obvious and false negatives don't matter because as long as the LLM finds a solution, it's not a huge deal if there was a better way to do it. Even when the LLM cannot solve the problem at all, it usually produces some useful artifacts for the human to build on.

	▲	infecto 2 days ago \| parent \| next [-]
		That’s fair and I typically have utilized LLM workflows where I believe the current gen of models shine. Classifications, data structuring, summarization, etc.
	▲	birn559 2 days ago \| parent \| prev [-]
		> as long as the LLM finds a solution, it's not a huge deal if there was a better way to do it It might not matter short term, but midterm such debt becomes a huge burden.

▲

Incipient 2 days ago | parent | prev | next [-]

I don't really track issues, as I don't need to. Just a recent example "please extract the tabular data from this visual" and the model had incorrect aligned records in one column, so the IDs were off by 1 in the data.

I'm sure in 95% of cases it gets it right, but it didn't this time, and I'm not sure how to actually work around that fact.

	▲	infecto 2 days ago \| parent [-]
		Not an attack on your experience at all! I would would definitely counter that multimodal are still error prone and much better output is achieved using a tool like textract and then an LLM on the output data.

▲

aaronbaugher 2 days ago | parent | prev [-]

I asked an LLM to guide me through a Salesforce process last week. It gave me step-by-step instructions, about 50% of which were fine while the others referenced options that didn't exist in the system. So I followed the steps until I got to a wrong one, then told it that was wrong, at which point it said that was wrong and gave me different instructions. After a few cycles of that and some trial-and-error, I had a working process.

It probably did save me some time, so I'd call it a mild success; but it didn't save a lot of time, and I only succeeded in the end because I know Salesforce pretty well and was just inexperienced at this one area, so I was able to see where it was probably going off the rails. Someone new to Salesforce would have been hopelessly lost by its advice.

It's understandable that an LLM wouldn't be very good at Salesforce, because there's a lot of bad information in the support forums out there, and the ways of doing things in it have changed multiple times over the years. But that's true of a lot of systems, so it's not an excuse, just a symptom of using LLMs that's probably not going to change.

▲

duxup 3 days ago | parent | prev [-]

I'm working on some AI projects and I'm building in "what just happened" kinda interface so folks understand if the result is in fact is what they wanted.

Management types seem baffled by the idea we would want this, even if they come around the next hour and say "hey user did something can you tell me what happened".

Like guies ... it's not 100%...