Remix.run Logo
vanuatu an hour ago

i suspect this is highly dependent on what you're working on

from my experience if you give the models a way to self-verify correctness they succeed basically 100% of the time