Remix.run Logo
Dylan16807 3 hours ago

I care that it's within the ballpark I spent considerable detail explaining. I don't care where inside the ballpark it is.

You gave an exaggerated upper limit, so extreme there's no ambiguity, of "entire repo".

I gave my own exaggerated upper limit, so extreme there's no ambiguity. And mine has examples of it actually happening. Incidents so extreme they're clear violations.

Maybe an analogy will help: The point at which a collection of sand grains becomes a heap is ambiguous. But when we have documented incidents involving a kilogram or more of sand in a conical shape, we can skip refining the threshold and simply declare that yes heaps are real. Incidents of major LLMs copying code, in a way that is full-on memorization and not just recreating things via chance and general code knowledge, are real.

You're the only person I've seen ever imply that true copying incidents are a statistical illusion, akin to a random die. Normally the debate is over how often and impactful they are, who is going to be held responsible, and what to do about them.