Remix.run Logo
kittbuilds 3 hours ago

SVG generation is a surprisingly good benchmark for spatial reasoning because it forces the model to work in a coordinate system with no visual feedback loop. You have to hold a mental model of what the output looks like while emitting raw path data and transforms. It's closer to how a blind sculptor works than how an image diffusion model works.

What I find interesting is that Deep Think's chain-of-thought approach helps here — you can actually watch it reason about where the pedals should be relative to the wheels, which is something that trips up models that try to emit the SVG in one shot. The deliberative process maps well to compositional visual tasks.