new | show | ask | jobs Github

jasonjmcghee 4 hours ago

Curious if anyone else had the same reaction as me

This model is specifically trained on this task and significantly[1] underperforms opus.

Opus costs about 6x more.

Which seems... totally worth it based on the task at hand.

[1]: based on the total spread of tested models

▲

beernet 4 hours ago | parent | next [-]

Agreed. The idea is nice and honorable. At the same time, if AI has been proving one thing, it's that quality usually reigns over control and trust (except for some sensitive sectors and applications). Of course it's less capital-intense, so makes sense for a comparably little EU startup to focus on that niche. Likely won't spin the top line needle much, though, for the reasons stated.

▲

segmondy 2 hours ago | parent | next [-]

Ha, keep putting your prompts and workflows into cloud models. They are not okay with being a platform, they intend to cannibalize all businesses. Quality doesn't always reign over control and trust. Your data and original ideas are your edge and moat.

▲

miohtama 4 hours ago | parent | prev | next [-]

Alignment tax directly eats to model quality, double digit percents.

▲

hermanzegerman 3 hours ago | parent | prev [-]

EU could help them very much if they would start enforcing the Laws, so that no US Company can process European data, due to the Americans not willing to budge on Cloud Act.

That would also help to reduce our dependency on American Hyperscalers, which is much needed given how untrustworthy the US is right now. (And also hostile towards Europe as their new security strategy lays out)

	▲	bcye 2 hours ago \| parent [-]
		This would be unfortunately a rather nuclear option due to the continent’s insane reliance on technology that breaks its unenforced laws.

▲

DarkNova6 4 hours ago | parent | prev | next [-]

I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.

▲

nimchimpsky 2 hours ago | parent | prev [-]

[dead]