Remix clone Hacker News

new | show | ask | jobs Github

	▲	Supermancho 8 hours ago
		> I found my interactions with Fable to be extremely impressive; it made other models, including GPT 5.5 and Opus 4.8, feel small and dumb. > Anthropic models have consistently been top-scoring in BullshitBench[0] eyeroll I find that Anthropic models feel big and dumber. https://www.endorlabs.com/research/ai-code-security-benchmar... puts Fable 5th, which seems about right to me. I'm interested in code utility and correctness, even if the majority of AI use is not focused on that.
	▲	airstrike 6 hours ago \| parent [-]
		I think this just proves anyone can pick a benchmark that supports their point so maybe we shouldn't use treat them as evidence at all.