While it's useful to not bother when you know it's unlikely to give good results, it does also feel a bit like a cop-out to suggest that the user shouldn't be asking it certain (unspecified) things in the first place. If this is the only solution, we should just crowdsource topics or types of question it can't do >50% of the time so not everyone has to reinvent the wheel

▲

theshrike79 3 days ago | parent | next [-]

If you ask an LLM to count the r's in "strawberry sherbert", it's 100% hit and miss.

But have it create a script or program in any language you want to do the same, I'm 99% sure it'll get it right the first time.

People use LLMs like graphing calculators, they're not. But you can have one MAKE a calculator and it'll get it right.

▲

mierz00 4 days ago | parent | prev [-]

It’s not that simple.

I’m making a tool to analyse financial transactions for accountants and identify things like misallocated expenses. Initially I was getting an LLM to try and analyse hundreds of transactions in one go. It was correct roughly 40-50% of the time, inconsistent and hallucinated frequently.

I changed the method to simple yes no question and to analyse each transaction individually. Now it is correct 85% of the time and very consistent.

Same model, same question essentially but a different way of asking it.

▲

Aachen 3 days ago | parent [-]

I don't see how that issue couldn't be an entry on the "not to do" or "not optimal usage" list

	▲	mierz00 2 hours ago \| parent [-]
		Edit: I see your point. That’s valid. I’m just not so sure it’s black and white. At least in my experience it hasn’t been.