Remix clone Hacker News

new | show | ask | jobs Github

	▲	jabedude 2 hours ago
		But that's removing a component that's critical for the test. We as users/benchmark consumers care that the service as provided by Anthropic/OpenAI/Google is consistent over time given the same model/prompt/context