Seems like it literally popped up yesterday with the express purpose of building hype for this release.

swyx an hour ago | parent | next [-]

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

osti an hour ago | parent | prev | next [-]

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

vanuatu 2 hours ago | parent | prev | next [-]

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

	▲	camdenreslink an hour ago \| parent \| next [-]
		People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.
	▲	anthonypasq 34 minutes ago \| parent \| prev [-]
		you didnt answer my question. Why would cognition be biased towards making anthropic look good?