Remix.run Logo
SchemaLoad 5 hours ago

Once the model has seen the questions and answers in the training stage, the questions are worthless. Only a test using previously unseen questions has merit.

lambda 5 hours ago | parent [-]

They aren't training new models for this. This is an agent harness for Opus 4.6.

measurablefunc 5 hours ago | parent [-]

All traffic is monitored, all signal sources are eventually incorporated into the training set in one way or another. The person you're responding to is correct, even a single API call to any AI provider is sufficient to discount future results from the same provider.

raincole 4 hours ago | parent | next [-]

You live in a conspiracy world. Those AI providers don't update the models that fast. You can try ask them solve ARC-AGI-3 without harness and see them struggle as yesterday yourself.

measurablefunc 3 hours ago | parent [-]

Which part is the conspiracy? Be as concrete as possible.

stale2002 5 hours ago | parent | prev [-]

ok! So if someone uses an existing, checkpointed, open source model then the answer is yes the results are valid and it doesn't matter that the tests are public.

measurablefunc 5 hours ago | parent [-]

Yes, assuming the checkpoint was before the announcement & public availability of the test set.