Remix.run Logo
raincole 5 days ago

It really isn't. Do you expect SOTA models to answer any answered question on the internet with 100% accuracy? Congrats you just compressed the whole internet (at least a few zettabytes) into a model (a few TB at most?).

OtherShrezzing 5 days ago | parent | next [-]

The linked ticket isn’t suggesting the commit is in the training data. It’s demonstrating that models run ‘git log’, find the exact code to fix the issue against which they’ll be scored, and then they implement that code as-is.

The test environment contains the answers to the questions.

imiric 4 days ago | parent | prev | next [-]

Well, we're dealing with (near) superintelligence here, according to the companies that created the models. Not only would I expect them to regurgitate the answers they were trained on, which includes practically the entire internet, but I would expect them to answer questions they weren't trained on. Maybe not with 100% accuracy, but certainly much higher than they do now.

It's perfectly reasonable to expect a level of performance concordant with the marketing of these tools. Claiming this is superintelligence, while also excusing its poor performance is dishonest and false advertising.

Tanjreeve 4 days ago | parent | prev [-]

Why does this matter if these models are a super intelligence with reasoning etc and don't need the answers sucked off the internet?