I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

I've tried multiple models over the course of 6 months. Yesterday it told me I made a brilliant observation, but it hasn't managed to successfully pin down a single real anomaly. Once it told me the charges were Starbucks, when I had not been to a Starbucks—it's just that Starbucks is a probable output when analyzing credit card statements.

And I'm only dealing with a list of 40 records that I can check by hand, with zero consequences if I get it wrong beyond my personal budgeting being off by 1%.

I can't imagine trusting any business that leans on this for inappropriate jobs.

▲

phkahler 5 days ago | parent [-]

>> I keep trying to get it to review my personal credit card statements. I have my own budget tracking app that I made, and sometimes there's discrepancies. Resolving this by hand is annoying, and an LM should be able to do it: scrape the PDF, compare the records to mine, find the delta.

This is a perfect example of what people don't understand (or on HN keep forgetting). LLMs do NOT follow instructions, they predict the next word in text and spit it out. The process is somewhat random, and certainly does not include an interpreter (executive function?) to execute instructions - even natural language instructions.

▲

seanmcdirmid 4 days ago | parent | next [-]

There are models that are tuned to follow instructions, and it kind of works. In a non deterministic way, like if you had a very unreliable junior that could kind of follow instructions but didn’t have very great attention yet.

▲

coffeefirst 5 days ago | parent | prev [-]

Agreed. I keep trying stuff because I feel like I’m missing whatever magic people are talking about.

So far, I’ve found nothing of value besides natural language search.

▲

balder1991 5 days ago | parent [-]

Yeah, if you go to a subreddit like ClaudeAI, you convince yourself there’s something you don’t know because they keep telling people it’s all their prompt faults if the LLM isn’t turning them into billionaires.

But then you read more of the comments and you see it’s really different interpretations from different people. Some “prompt maximalists” believe that perfect prompting is the key to unlocking the model's full potential, and that any failure is a user error. They tend to be the most vocal and create a sense that there's a hidden secret or a "magic formula" you're missing.

▲

Jensson 5 days ago | parent [-]

Its basically making a stone soup, people wont believe it can be done, but then put a stone in water and boil it, and tell people if you aren't getting a nice soup you aren't doing it right, just put in all these other ingredients that aren't required but really helps and you get this awesome soup!

Then someone say that isn't stone soup, they just did all the work without the stone! But that is just a stone hater, how can you not see this awesome soup made by the stone?

	▲	canonistically 4 days ago \| parent [-]
		I think it's more like lottery winners giving "buy lottery tickets" as financial advice. It is clear at this point that any meaning found in LLM outputs is projected there by the user. Some people by virtue of several intertwined factors can get some acceleration out of them, but most can't. It becomes like a football fan convinced that their rituals are essential for the team's victory. Add the general very low understanding of machine learning (or even basic formal logic) and people go from a realistic emulation of conversations to magical thinking about having a mind in a box. Either that or I am taking crazy pills. Because sometimes it feels like that.