Remix clone Hacker News

new | show | ask | jobs Github

	▲	fellowniusmonk 5 days ago
		I have a very complex set of logic puzzles I run through my own tests. My logic test and trying to get an agent to develop a certain type of implementation (that is published and thus the model is trained on to some limited extent) really stress test models, 5.2 is a complete failure of overfitting. Really really bad in an unrecoverable infinite loop way. It helps when you have existing working code that you know a model can't be trained on. It doesn't actually evaluate the working code it just assumes it's wrong and starts trying to re-write it as a different type of . Even linking it to the explanation and the git repo of the reference implementation it still persists in trying to force a different **. This is the worst model since pre o3. Just terrible.