I have wondered if with these tests it'll reach a point where online models cheat by generating a line art raster reference then behind the scenes deciding how to vectorize it in the most minimalist way (eg: using strokes and shape elements, etc, rather than naively using path outlines for all forms).

▲

simonw 2 hours ago | parent | next [-]

This Deep Think one was so good that I did get suspicious that maybe it was at least rendering the SVG to an image and then "looking" at the image and tweaking it over a few iterations.

But the reasoning trace doesn't hint at that and looks legit to me: https://gist.github.com/simonw/7e317ebb5cf8e75b2fcec4d0694a8...

I also asked Deep Think what tools it has access to and it has Python and Bash but no internet access, and as far as I can tell that environment doesn't have any libraries or tools installed that can render an SVG to an image format that it could view.

▲

taberiand 4 hours ago | parent | prev [-]

Is that cheating, or is that just working smarter not harder?

	▲	Springtime 4 hours ago \| parent [-]
		The interesting aspect of the ongoing tests I feel is seeing how models can plan out an image directly using SVG primatives solely through reasoning (code-to-code). If they have a reference then it's a different type of challenge (optimizing for a trace).