Remix.run Logo
m3kw9 3 days ago

Does learning this still matter now?

lazarus01 3 days ago | parent | next [-]

Yes, the current technology cannot replace an engineer.

The easiest way to understand why is by understanding natural language. A natural language like english is very messy and and doesn't follow formal rules. It's also not specific enough to provide instructions to a computer, that's why code was created.

The AI is incredibly dumb when it comes to complex tasks with long range contexts. It needs an engineer that understands how to write and execute code to give it precise instructions or it is useless.

Natural Language Processing is so complex, it started around the end of world war two and we are just now seeing innovation in AI where we can mimmick humans, where the AI can do certain things faster than humans. But thinking is not one of them.

CamperBob2 3 days ago | parent [-]

LOL. Figuring out how to solve IMO-level math problems without "thinking" would be even more impressive than thinking itself. Now there's a parrot I'd buy.

lazarus01 3 days ago | parent [-]

It isn't thinking it's RL with reward hacking.

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems, because they did not study those type of problems. Where a human who is good at IMO math generalizes to all math problems.

It's just memorizing a trajectory as part of a specific goal. That's what RL is.

CamperBob2 3 days ago | parent [-]

It's like taking a student who wins a gold in IMO math, but can't solve easier math problems

I've tried to think of specific follow-up questions that will help me understand your point of view, but other than "Cite some examples of easier problems than a successful IMO-level model will fail at," I've got nothing. Overfitting is always a risk, but if you can overfit to problems you haven't seen before, that's the fault of the test administrators for reusing old problem forms or otherwise not including enough variety.

GPT itself suggests[1] that problems involving heavy arithmetic would qualify, and I can see that being the case if the model isn't allowed to use tools. However, arithmetic doesn't require much in the way of reasoning, and in any case the best reasoning models are now quite decent at unaided arithmetic. Same for the tried-and-true 'strawberry' example GPT cites, involving introspection of its own tokens. Reasoning models are much better at that than base models. Unit conversions were another weakness in the past that no longer seems to crop up much.

So what would some present-day examples be, where models that can perform complex CoT tasks fail on simpler ones in ways that reveal that they aren't really "thinking?"

1: https://chatgpt.com/share/695be256-6024-800b-bbde-fd1a44f281...

lazarus01 2 days ago | parent [-]

In response to your direct question -> https://gail.wharton.upenn.edu/research-and-insights/tech-re...

“ This indicates that while CoT can improve performance on difficult questions, it can also introduce variability that causes errors on “easy” questions the model would otherwise answer correctly.”

Other response to strawberry example; There are 25,000 people employed globally that repair broken responses and create training data, a big whack-a-mole effort to remediate embarrassing errors.

CamperBob2 2 days ago | parent [-]

(Shrug) Ancient models are ancient. Please provide specific examples that back up your point, not obsolete .PDFs to comb through.

shwaj 3 days ago | parent | prev [-]

Matter to who? If you want to deeply understand how this technology works, this is still relevant. If you want to vibe code, maybe not.