Remix.run Logo
jmyeet 3 hours ago

I love stories like this because there are still allegedly tech-savvy people who will insist that AIs don't lie, don't hallucinate and rarely if ever make errors.

At the end of the day, LLMs are a statistical approximation or projection.

A good example of this is how LLMs struggle with multiplication, particularly multipolcation of large numbers. It's not just that they make mistakes but the nature of the results.

Tell ChatGPT to multiply 129348723423 and 2987892342424 and it'll probably get it wrong because nowhere on Reddit is that exact question for it to copy. But what's interesting is it'll tend to get the first and large digits correct (more often than not) but the middle is just noise.

Someone will probably say "this is a solved problem" because somebody, somewhere has added this capability to a given LLM but these kinds of edge cases I think will constantly expose the fundamental limits of transformers, just like the famous "how many r's in strawberry?" example that di the rounds.

All this comes up when you tell LLMs to write legal briefs. They completely make up a precedent because they learn what a precedent looks like and generate something similar. Lawyers have been caught submitting fake precedents in court filings due to this.

simianwords 3 hours ago | parent | next [-]

> Tell ChatGPT to multiply 129348723423 and 2987892342424 and it'll probably get it wrong because nowhere on Reddit is that exact question for it to copy. But what's interesting is it'll tend to get the first and large digits correct (more often than not) but the middle is just noise.

People have no idea how capable LLM's are and confidently write these kind of things.

jmyeet 2 hours ago | parent [-]

This is a known problem and an active area of research [1][2][3][4].

[1]: https://arxiv.org/html/2505.15623v1

[2]: https://medium.com/@adnanmasood/why-large-language-models-st...

[3]: https://www.reachcapital.com/resources/thought-leadership/wh...

[4]: https://mathoverflow.net/questions/502120/examples-for-the-u...

simianwords 2 hours ago | parent [-]

the research does't capture the fact that LLM's can easily multiply these results. I mean it literally won gold in IMO, Putnam.

Take 10,000 such multiplications. I'm sure not even a single one would be incorrect with GPT 5.2 (thinking). Want a wager?

ceejayoz 2 hours ago | parent | prev [-]

> Tell ChatGPT to multiply 129348723423 and 2987892342424 and it'll probably get it wrong because nowhere on Reddit is that exact question for it to copy.

ChatGPT appears to get this correct.