Remix.run Logo
nfw2 a day ago

Latest reasoning models don't claim 2 + 2 = 55, and it's hard to find them making an sort of obviously false claims, or not admitting to being mistaken if you point out that they are

taormina a day ago | parent | next [-]

I can’t go a full a full conversation without obviously false claims. They will insist you are correct and that your correction is completely correct despite that also being wrong.

nfw2 21 hours ago | parent [-]

Ironically the start of this thread was bemoaning the use of anecdotal evidence

citizenpaul 6 hours ago | parent [-]

Also that I specifically mentioned bikeshedding yet the reply bikesheds my simple example. While ignoring the big picture that LLM's still regularly generate blatantly and easily noticed false information as answers.

citizenpaul 6 hours ago | parent | prev [-]

It was clearly a simplified example, like I said endless bikeshed.

Here is a real one. I was using the much lauded new Gemini 3? last week and wanted it to do something a slightly specific way for reasons. I told it specifically and added it to the instructions. DO NOT USE FUNCTION ABC.

It immediately used FUNCTION ABC. I asked it to read back its instructions to me. It confirmed what I put there. So I asked it again to change it to another function. It told me that FUNCTION ABC was not in the code, even though it was clearly right there in the code.

I did a bit more prodding and it adamantly insisted that the code it generated did not exist, again and again and again. Yes I tried reversing to USE FUNCTION XYZ. Still wanted to use ABC