Remix.run Logo
lewdwig a day ago

Well-designed benchmarks have a public sample set and a private testing set. Models are free to train on the public set, but they can't game the benchmark or overfit the samples that way because they're only assessed on performance against examples they haven't seen.

Not all benchmarks are well-designed.

thousand_nights a day ago | parent [-]

but as soon as you test on your private testing set you're sending it to their servers so they have access to it

so effectively you can only guarantee a single use stays private

minimaxir a day ago | parent [-]

Claude does not train on API I/O.

> By default, we will not use your inputs or outputs from our commercial products to train our models.

> If you explicitly report feedback or bugs to us (for example via our feedback mechanisms as noted below), or otherwise explicitly opt in to our model training, then we may use the materials provided to train our models.

https://privacy.anthropic.com/en/articles/7996868-is-my-data...

behindsight a day ago | parent [-]

Relying on their own policy does not mean they will adhere to it. We have already seen "rogue" employees in other companies conveniently violate their policies. Some notable examples were in the news within the month (eg: xAI).

Don't forget the previous scandals with Amazon and Apple both having to pay millions in settlements for eavesdropping with their assistants in the past.

Privacy with a system that phones an external server should not be expected, regardless of whatever public policy they proclaim.

Hence why GP said:

> so effectively you can only guarantee a single use stays private