A Lovecraft reference, nice. I'm wondering whether a smaller model would suffice as well.

https://knowyourmeme.com/memes/shoggoth-with-smiley-face-art... https://www.nytimes.com/2023/05/30/technology/shoggoth-meme-...

▲

troyvit 17 hours ago | parent | prev [-]

Yeah I came here to say the same thing. It seems like it would simplify things. They do say:

"I initially considered training a single end-to-end VLA model. [...] A cable-driven soft robot is different: the same tip position can correspond to many cable length combinations. This unpredictability makes demonstration-based approaches difficult to scale.[...] Instead, I went with a cascaded design: specialized vision feeding lightweight controllers, leaving room to expand into more advanced learned behaviors later."

I still think circling back to smaller models would be awesome. With some upgrades you might get a locally hosted model on there, but I'd be sure to keep that inside a pentagram so it doesn't summon a Great One.

	▲	joshuabaker2 16 hours ago \| parent [-]
		I was surprised it pinged gpt-4o. I was expecting it to use something like https://github.com/apple/ml-fastvlm (obviously cost may have been a factor there), but I can see how the direction he chose would make it more capable of doing more complex behaviours in the future w.r.t adding additional tentacles for movement and so on.