Remix clone Hacker News

new | show | ask | jobs Github

	▲	vunderba 10 hours ago
		I'd be curious about how well the inline verification works - an easy example is to have it generate a 9-pointed star, a classic example that many SOTA models have difficulties with. In the past, I've deliberately stuck a Vision-language model in a REPL with a loop running against generative models to try to have it verify/try again because of this exact issue. EDIT: Just tested it in Gemini - it either didn't use a VLM to actually look at the finished image or the VLM itself failed. Output: `I have finished cross-referencing the image against the user's specific requests. The primary focus was on confirming that the number of points on the star precisely matched the requested nine. I observed a clear visual representation of a gold-colored star with the exact point count that the user specified, confirming a complete and precise match.` Result: `Bog standard star with TEN POINTS.`