Remix.run Logo
Imustaskforhelp 13 hours ago

I really found this fascinating as I had thought that these type of problems of how many e's are in the word strawberry etc. were stopped but as this video shows, perhaps its just that this question of how many e's are in the word strawberry itself got part in the training data and so even a slight variation of asking it for seventeen makes it fumble. I had thought that this was a solved issue but actually it isn't which was a bit fascinating to see in this video, so much so that I had to test it out and I found out that AI still hallucinates and had the same result for the most part.

CamperBob2 12 hours ago | parent [-]

The Qwen 3.6 27B 8-bit quant has no problem with it. I'd guess that most thinking models won't fail this kind of test anymore, while some base or instruct models that are not post-trained for reasoning will still fail it.

I also can't reproduce it in ChatGPT 5.3 Instant with auto-thinking disabled. Solved problem, as far as I'm concerned. Maybe this particular case was a bug in the voice model, or just some BS the YouTuber made up for clicks. (Notice that we never actually see the answer in text form.) Mission accomplished, I guess.

Imustaskforhelp 11 hours ago | parent [-]

For what its worth, I tried this myself in chatgpt before uploading it and it said to me that there are three e's which is what made me upload it as I just had to try it out so there's my anecdotal evidence which was the reason why I uploaded it in the first place.

Actually let me replicate it, here you go: https://chatgpt.com/share/69f7a27a-2634-83e8-bffa-520bd2ad47...

I am saying that these models are still incredibly finnicky, I can sometimes get the right answer too don't get me wrong but its just fundamentally unpredictable and seems more like guess work at times too just as how for the original video person, it said there are no e's first then said 4, then said 5 and for me it said 3 but sometimes it would say 4 too.

So my point is saying that its a solved problem doesn't seem accurate to me if I am able to replicate it from my testing and the first time that I tried it in my chatgpt it also said 3.

Edit: here is another chatgpt link seperate from the first one I shared where it says 3 again https://chatgpt.com/share/69f7a3a6-aa1c-83e8-b622-52cb2a9b10...

And I tried another time too so here is yet another one https://chatgpt.com/share/69f7a3ee-07c8-83e8-ba43-65800d8907...

Do note that All links are different even though they share the first 69f7a3ee but the whole links/chats are different)

CamperBob2 10 hours ago | parent [-]

Weird. What model is selected? I can't get anything but 4 out of GPT 5.3 Instant, which should be the weakest available in the current generation. Try this one: https://chatgpt.com/s/t_69f7a8657368819185b2830297216b2b

Edit due to rate limiting: No, I totally believe you, it's just not 100% consistent as you might expect. It did start returning three on occasion when I tried it again, maybe one out of 5 times. Pretty crazy when a 27B free Chinese model outperforms GPT 5.x.

In general, I wouldn't normally try a prompt like this without turning on thinking mode. Notice how Qwen painstakingly spells it out: https://i.imgur.com/FMKXB1M.png

Imustaskforhelp 10 hours ago | parent [-]

https://chatgpt.com/share/69f7b127-24b0-83e8-ae82-2700c3d7be...

I am also using the GPT 5.3 model, in this conversation I even asked what model are you and then how many e's are in seventeen and it still responded 3.

its still gpt 5.3 which I am talking to and this shouldn't be of any significance but still its free tier account for me if this is any help for you to replicate it as I have replicated it multiple times now at this point. Perhaps your account is in premium tier but if you have selected the weakest model

Then is the weakest selected model of gpt 5.3 for pro users still better than the free user model, this would be a bigger thing if true but this does feel conspiratorial but yet, I am unable to really understand why you are unable to replicate it. Feel free to message me on mail if you are still skeptic and want more proof though and I am more than happy if you are still skeptical about it but the fact is that OpenAI models still talk to me about 3 e's.

I would be curious to help you replicate it so let me know.

Edit: 2 minutes after making this post I realized that I accidentally asked it strawberry as yeah that was what I remembered it xD and it seems that I can't change the original chat but here's an image https://files.catbox.moe/j3d87e.png (messages beyond this point are only visible to you)

https://chatgpt.com/share/69f7b324-a0b8-83e8-8fd9-47a101782c... Here is another chat where I ask it for seventeen

So uh my suspicion is that it works for strawberry because it has been trained on it but it hasn't been trained on seventeen could play a factor but yes it might be a bit conspitorial so I am gonna play a bit more about this

here is another conversation where it says right for strawberry but not for seventeen https://chatgpt.com/share/69f7b38c-d154-83e8-953e-edb39e9c8a...

https://chatgpt.com/share/69f7b3bb-5c4c-83e8-ba00-a67c1db91c...

Ooh this is another interesting conversation, this one actually failed on the strawberry test too by saying it has one e so it failed on both tests

https://chatgpt.com/share/69f7b432-605c-83e8-ac0c-0f89ea17e1...

Here is another chat where in the fast mode (note that I hadn't selected the fast mode) it said 2 e's in strawberry.

Yet every other time I tried with the e's in strawberry it said 3, yet for seventeen, for even strawberry sometimes after asking about seventeen, it gives hiccups and tells the wrong answers.

It gives wrong answer for number of e's in strawberry after I ask it for in seventeen, this is really interesting behaviour. I didn't intend on making this long of a message and my first message was just a error but I think that I discovered something deeper, I might write a blog post about this finding but my suspicions are going even larger, I thought that strawberry test was solved by AI yet it seems to not be the case?

Edit-2: I just realized that within the questions I asked it to how many e's are in strawberry and it STILL answered me 3. This is just so much worse for them, wow.