Would a way to take screenshots help? It seems to work for browser testing.

I’ve been doing game development and it starts to hallucinate more rapidly when it doesn’t understand things like the direction it placing things or which way the camera is oriented

Gemini models are a little bit better about spatial reasoning, but we’re still not there yet because these models were not designed to do spatial reasoning they were designed to process text

In my development, I also use the ascii matrix technique.

▲

kleene_op 8 hours ago | parent | next [-]

Spatial awareness was also a huge limitation to Claude playing pokemon.

It really seems to me that the first AI company getting to implement "spatial awareness" vector tokens and integrating them neatly with the other conventional text, image and sound tokens will be reaping huge rewards. Some are already partnering with robot companies, it's only a matter of time before one of those gets there.

	▲	nszceta 7 hours ago \| parent [-]
		This is also my experience with attempting to use Claude and GLM-4.7 with OpenSCAD. Horrible spatial reasoning abilities.

▲

hypercube33 7 hours ago | parent | prev | next [-]

I disagree. With opus I'll screenshot an app and draw all over it like a child with me paint and paste it into the chat - it seems to reasonably understand what I'm asking with my chicken scratch and dimensions.

As far as 3d I don't have experience however it could be quite awful at that

	▲	vunderba 4 minutes ago \| parent [-]
		Yeah at least for 2D, Opus 4.5 seems decent. It can struggle with finer details, so sometimes I’ll grab a highlighter tool in Photoshop and mark the points of interest.

▲

miohtama 8 hours ago | parent | prev [-]

They would need a spatial reason or layout specific tool, to translate to English and back

	▲	falcor84 7 hours ago \| parent [-]
		I wonder if they could integrate a secondary "world model" trained/fine-tuned on Rollercoaster Tycoon to just do the layout reasoning, and have the main agent offload tasks to it.