| ▲ | Dylan16807 3 hours ago | |
I care that it's within the ballpark I spent considerable detail explaining. I don't care where inside the ballpark it is. You gave an exaggerated upper limit, so extreme there's no ambiguity, of "entire repo". I gave my own exaggerated upper limit, so extreme there's no ambiguity. And mine has examples of it actually happening. Incidents so extreme they're clear violations. Maybe an analogy will help: The point at which a collection of sand grains becomes a heap is ambiguous. But when we have documented incidents involving a kilogram or more of sand in a conical shape, we can skip refining the threshold and simply declare that yes heaps are real. Incidents of major LLMs copying code, in a way that is full-on memorization and not just recreating things via chance and general code knowledge, are real. You're the only person I've seen ever imply that true copying incidents are a statistical illusion, akin to a random die. Normally the debate is over how often and impactful they are, who is going to be held responsible, and what to do about them. | ||