Remix.run Logo
awoimbee 2 hours ago

The benchmarks aren't great, they're super specific to sem's output: why would I ask Claude how many "entities" were modified by a commit and do I need a tool specifically for this request ? Note that an "entity" is a sem-specific concept...

rohanucla an hour ago | parent [-]

Thanks for pointing it out. I agree with you here, my testing process was quite specific to sem's output but also would love any suggestion from you of how you would design the whole testing process for this kind of tool?

I can also give my thought process, because I was more interested in figuring out the model's inherent search results and understanding without sem.