Every bit of improvement on AI ability will have the corresponding denial phrase. Some people still think AI can't generate the correct number of fingers today.

▲

halJordan an hour ago | parent [-]

I love to hate it when someone unironically thinks asking an llm how many letters are in a word is a good test

	▲	Jerrrrrrrry 7 minutes ago \| parent [-]
		It is a good test now, for reasoning models. It was a terrible test for pure tokenized models, because the logit that carries the carry digit during summation has a decent chance at getting lost. SOTA models should reason to generate a function that returns the count of a given character, evaluate the function with tests, and use it for the output.