Remix clone Hacker News

Testing against unspecified other "leading" models allows for shenanigangs:

> Qodo tested GPT‑4.1 head-to-head against other leading models [...] they found that GPT‑4.1 produced the better suggestion in 55% of cases

The post seems to be up now and seems to compare it slightly favorable to Claude 3.7.

	▲	croemer 6 days ago \| parent [-]
		Right, now it's up and comparison against Claude 3.7 is better than I feared based on the wording. Though why does the OpenAI announcement talk of comparison against multiple leading models when the Qodo blog post only tests against Claude 3.7...