Remix.run Logo
voidhorse 3 days ago

Just yesterday I was thinking how useful a tool like this would be. Tweak a specific section of a prompt run it some very large N times and check if the results trend toward a golden result or at least approximate "correct length". Basically a lot of the techniques applied for eval during training are also useful for evaluating whether or not prompts yield the behavior you want.