| ▲ | godelski 3 hours ago | |||||||
It is weird to read because they bring up many things a lot of people have been critiquing for years.
I'm glad the conversation is changing but it's been a bit frustrating that when these issues were brought up people blindly point to benchmarks. It made doing this type of research difficult (enough to cause many to be pushed out). Then it feels weird to say "harder than we thought" because well... truthfully, they even state why this result should be expected
And that's only a fraction of the story. Online algorithms aren't enough. You still need a fundamental structure to codify and compress information, determine what needs to be updated (as in what is low confidence), to actively seek out new information to update that confidence, make hypotheses, and so so much more.So I hope the conversation keeps going in a positive direction but I hope we don't just get trapped in a "RL will solve everything" trap. RL is definitely a necessary component and no doubt will it result in improvements, but it also isn't enough. It's really hard to do deep introspection into how you think. It's like trying to measure your measuring stick with your measuring stick. It's so easy to just get caught up in oversimplification and it seems like the brain wants to avoid it. To quote Feynman: "The first principle is to not fool yourself, and you're the easiest person to fool." It's even easier when things are exciting. It's so easy because you have evidence for your beliefs (like I said, RL will make improvements). It's so easy because you're smart, and smart enough to fool yourself. So I hope we can learn a bigger lesson: learning isn't easy, scale is not enough. I really do think we'll get to AGI but it's going to be a long bumpy road if we keep putting all our eggs in one basket and hoping there's simple solutions. | ||||||||
| ▲ | winddude 2 hours ago | parent [-] | |||||||
People have been bringing that up long before AI, on how schooling often tests on memorization and regurgitation of facts. Looking up facts is also a large part of the internet, so it is something that's in demand, and i believe a large portion of openAI/cluade prompts have a big overlap with google queries [sorry no source].I haven't looked at the benchmark details they've used, and it may depend on the domain, empirically it seems coding agents improve drastically on unseen libs or updated libs with the latest documentation. So I think that a matter of the training sets, where they've been optimized with code documentation. So the interim step until a better architecture is found is probably more / better training data. | ||||||||
| ||||||||