I don’t have much confidence n the premise. Where was the human control? I think most Python programmers when tasked with “now do it in brainfuck” would fail. There is not much meaningful overlap in how to express intent and solutions to problems. The ridiculous syntax is the joke.

But more importantly, I don’t have to solve any problems with languages that are elaborate practical jokes, so I’m not worried about the implications of an LLMs ability to be useful.

▲

culi 2 hours ago | parent | next [-]

The point here is to test for "genuine reasoning" or something approaching it. If a model is truly reasoning it should be competent even in a new language you just made up (provided the language itself is competently designed)

	▲	wehnsdaefflae 2 hours ago \| parent [-]
		So humans don't do "genuine reasoning"?

▲

dvt 3 hours ago | parent | prev [-]

> I don’t have to solve any problems with languages that are elaborate practical jokes

This is just being needlessly dismissive. Esolangs are (and have been) an area of active CS research for decades. I know I'm a bit of an esolang nerd, and while some are jokes, most focus on specific paradigms (e.g. Piet is visual, bf is a Turing tarpit, etc.).

> I think most Python programmers when tasked with “now do it in brainfuck” would fail.

This is untrue. Given internet-level awareness and infinite time, virtually all developers should be able to go from Python to brainfuck (trivially, I might add.) Did you even look at the test sets? It's all pretty basic stuff (palindromes, array traversal, etc.—we aren't using pandas here). I mean, sure, it would take forever and be mega annoying, but manipulating a head and some tape is hardly difficult.