Remix.run Logo
mierz00 4 days ago

It’s not that simple.

I’m making a tool to analyse financial transactions for accountants and identify things like misallocated expenses. Initially I was getting an LLM to try and analyse hundreds of transactions in one go. It was correct roughly 40-50% of the time, inconsistent and hallucinated frequently.

I changed the method to simple yes no question and to analyse each transaction individually. Now it is correct 85% of the time and very consistent.

Same model, same question essentially but a different way of asking it.

Aachen 3 days ago | parent [-]

I don't see how that issue couldn't be an entry on the "not to do" or "not optimal usage" list

mierz00 2 hours ago | parent [-]

Edit: I see your point. That’s valid.

I’m just not so sure it’s black and white. At least in my experience it hasn’t been.