If you're asking simple riddles, you shouldn't be paying for SOTA frontier models with long context.

This is a silly test for the big coding models.

This is like saying "all calculators are the same, nobody needs a TI-89!" and then adding 1+2 on a pocket calculator to prove your point.

▲

Balinares 3 hours ago | parent | next [-]

I find it's a great test, actually. There are lots of "should I take the car" decisions in putting together software that's supposed to do things, and with poor judgement in how the things should be done, you typically end up with the software equivalent of a Rube-Goldberg machine that harnesses elephants to your car and uses mice to scare the elephants toward the car wash while you walk. After all, it's a short distance, isn't it?

▲

grey-area 5 hours ago | parent | prev [-]

No it’s like having a calculator which is unable to perform simple arithmetic, but lots of people think it is amazing and sentient and want to talk about that instead of why it can’t add 2 + 2.

	▲	viraptor an hour ago \| parent [-]
		We know why it's not going to do precise math and why you can have better experience asking for an app solving the math problem you want. There's no point talking about it - it's documented in many places for people who are actually interested.