Remix.run Logo
Ndotkess a day ago

What is your approach to measuring accuracy?

johnsillings a day ago | parent | next [-]

I'm sure Henry will chime in here, but there's some more info here in the technical announcement: https://www.span.app/introducing-span-detect-1

"span-detect-1 was evaluated by an independent team within Span. The team’s objective was to create an eval that’s free from training data contamination and reflecting realistic human and AI authored code patterns. The focus was on 3 sources: real world human, AI code authored by Devin crawled from public GitHub repositories, and AI samples that we synthesized for “brownfield” edits by leading LLMs. In the end, evaluation was performed with ~45K balanced datasets for TypeScript and Python each, and an 11K sample set for TSX."

henryl a day ago | parent | prev [-]

More details about how we eval'ed here:

https://www.span.app/introducing-span-detect-1