Remix clone Hacker News

new | show | ask | jobs Github

	▲	faxmeyourcode 33 minutes ago
		I had a hunch that opus 4.7 hedged more than other models - and it turns out it's true `model total_claims hedged_count hedged_pct claude-opus-4-7 1000 451 45.1 sonar-pro 1000 391 39.1 gpt-5.4 1000 277 27.7 gemini-3-retrieval 1000 129 12.9 gemini-3-pro 1000 60 6.0` datasette query here https://lite.datasette.io/?csv=https%3A%2F%2Fstatic.simonwil...
	▲	kostaj 27 minutes ago \| parent [-]
		This is in line with my observations and tests as well. Also supported by the distribution of the verdicts across the 4-buckets -- Gemini uses the middle buckets (Mostly True and Misleading) much less often - 6% combined for Gemini w/o search. And Opus uses them the most - 45% combined. Looks like Gemini is calibrated to be confident and Opus to be careful.