Remix clone Hacker News

new | show | ask | jobs Github

	▲	embedding-shape 2 hours ago
		> I guess the goal is to test the models and not the harness Less important than the harness, is the system/user prompts themselves (which of course, are put in the harness), which is effectively what this study seems to be testing. With a better prompt, I'm sure the models would look more the same to each other, as the biggest/best models have more or less identical strong prompt-adherence in my experience.