Remix clone Hacker News

new | show | ask | jobs Github

	▲	Aerroon 5 hours ago
		I think the workflows can be really interesting to read about. The other week I read a reddit post how someone got Qwen3.5 35B-A3B to go from 22.2% on the 45 hard problems of swebench-verified to 37.8% (opus 4.6 gets 40%). All they essentially did was tell the LLM to test and verify whether the answer is correct with a prompt like the following: >"You just edited X. Before moving on, verify the change is correct: write a short inline python -c or a /tmp test script that exercises the changed code path, run it with bash, and confirm the output is as expected." Now whether this is true, I don't know, but I think talking about this kind of stuff is cool!