Remix.run Logo
simonw 4 days ago

It's useful to build up an intuition for what kind of questions LLMs can answer and what kind of questions they can't.

Once you've done that your success rate goes way up.

Aachen 4 days ago | parent | next [-]

While it's useful to not bother when you know it's unlikely to give good results, it does also feel a bit like a cop-out to suggest that the user shouldn't be asking it certain (unspecified) things in the first place. If this is the only solution, we should just crowdsource topics or types of question it can't do >50% of the time so not everyone has to reinvent the wheel

theshrike79 3 days ago | parent | next [-]

If you ask an LLM to count the r's in "strawberry sherbert", it's 100% hit and miss.

But have it create a script or program in any language you want to do the same, I'm 99% sure it'll get it right the first time.

People use LLMs like graphing calculators, they're not. But you can have one MAKE a calculator and it'll get it right.

mierz00 4 days ago | parent | prev [-]

It’s not that simple.

I’m making a tool to analyse financial transactions for accountants and identify things like misallocated expenses. Initially I was getting an LLM to try and analyse hundreds of transactions in one go. It was correct roughly 40-50% of the time, inconsistent and hallucinated frequently.

I changed the method to simple yes no question and to analyse each transaction individually. Now it is correct 85% of the time and very consistent.

Same model, same question essentially but a different way of asking it.

Aachen 3 days ago | parent [-]

I don't see how that issue couldn't be an entry on the "not to do" or "not optimal usage" list

mierz00 2 hours ago | parent [-]

Edit: I see your point. That’s valid.

I’m just not so sure it’s black and white. At least in my experience it hasn’t been.

rplnt 4 days ago | parent | prev | next [-]

Oftentimes I ask simple factual questions that I don't know the answer to. This is something it should excel at, yet it usually fails, at least on the first try. I guess I subconsciously ignore questions that are extremely easy to google (if you ignore the worst AI in existence) or can be found by opening the [insert keyword] wikipedia article. You don't need AI for those.

simonw 4 days ago | parent [-]

Amusingly enough, my rule of thumb for if an LLM is likely to be able to answer a question is "could somebody who just read the relevant Wikipedia page answer this?"

Although that changed this year with o3 (and now GPT-5) getting really good at using Bing for search: https://simonwillison.net/2025/Apr/21/ai-assisted-search/

apwell23 4 days ago | parent | prev [-]

> It's useful to build up an intuition for what kind of questions LLMs can answer and what kind of questions they can't.

Can you put you intuition into words so we can learn from you ?

simonw 4 days ago | parent [-]

I can't. That's my single biggest frustration about using LLMs: so much of what they can and cannot do comes down to intuition you need to build up over time, and I can't figure out how to express that intuition in a way that can quickly transfer to other people.