It’s not that simple.

I’m making a tool to analyse financial transactions for accountants and identify things like misallocated expenses. Initially I was getting an LLM to try and analyse hundreds of transactions in one go. It was correct roughly 40-50% of the time, inconsistent and hallucinated frequently.

I changed the method to simple yes no question and to analyse each transaction individually. Now it is correct 85% of the time and very consistent.

Same model, same question essentially but a different way of asking it.

▲

Aachen 3 days ago | parent [-]

I don't see how that issue couldn't be an entry on the "not to do" or "not optimal usage" list

	▲	mierz00 2 hours ago \| parent [-]
		Edit: I see your point. That’s valid. I’m just not so sure it’s black and white. At least in my experience it hasn’t been.