Very much depends on what you want to do.

The fact that a language model can „reason“ (in the LLM-slang meaning of the term) about 3D space is an interesting property.

If you give a text description of a scene and ask a robot to perform a peg in hole task, modern models are able to solve them fairly easily based on movement primitives. I implemented this on a UR robot arm back in 2023

The next logical step is, instead of having the model output text (code representing movement primitives), outputting tokens in action space. This is what models like pi0 are doing.

▲

volkercraig 5 hours ago | parent [-]

I mean semantically language evolved as an interpretation for the material world, so assuming that you can describe a problem in language, and considering that there exists a solution to said problem that is describable in language, then I'm sure a big enough LLM could do it... but you can also calculate highly detailed orbital maps with epicycles if you just keep adding more... you just don't because it's a waste of time and there's a simpler way.

The latter part is interesting. I'm not sure how the performance of one of those would be once they are working well, but my naive gut feeling is that splitting the language part and the driving part into two delegates is cleaner, safer, faster and more predictable.

	▲	convolvatron 5 hours ago \| parent [-]
		note that the control systems you were talking about before (i.e. PID) would probably take hold pretty directly in a tiny network, and exactly because of that limitation, be far less likely to contain 'hallucinations'. object avoidance and path planning are likely similar. since this is a limited and continuous domain, its a far better one for neural training than natural language. I guess this notion that a language model should be used for 3d motion control is a real indicator about the level of thought going into some of these applications.