Remix.run Logo
emp17344 2 hours ago

Seems like it literally popped up yesterday with the express purpose of building hype for this release.

swyx an hour ago | parent | next [-]

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

osti an hour ago | parent | prev | next [-]

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

vanuatu 2 hours ago | parent | prev | next [-]

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

anthonypasq 2 hours ago | parent | prev [-]

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

bel8 2 hours ago | parent [-]

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

camdenreslink an hour ago | parent | next [-]

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

anthonypasq 34 minutes ago | parent | prev [-]

you didnt answer my question. Why would cognition be biased towards making anthropic look good?