It does all start to feel like we'd get fairly close to being able to convincingly emulate a lot of human or at least animal behavior on top of the existing generative stack, by using brain-like orchestration patterns ... if only inference was fast enough to do much more of it.

The gauge-reading example here is great, but in reality of course having the system synthesize that Python script, run the CV tasks, come back with the answer etc. is currently quite slow.

Once things go much faster, you can also start to use image generation to have models extrapolate possible futures from photos they take, and then describe them back to themselves and make decisions based on that, loops like this. I think the assumption is that our brains do similar things unconsciously, before we integrate into our conscious conception of mind.

I'm really curious what things we could build if we had 100x or 1000x inference throughput.

▲

moonu 8 hours ago | parent | next [-]

Idk if you've seen this already but Taalas does this interesting thing where they embed the model directly onto the chip, this leads to super-fast speeds (https://chatjimmy.ai) but the model they're using is an old small Llama model so the quality is pretty bad. But they say that it can scale, so if that's really true that'd be pretty insane and unlock the inference you're talking about.

	▲	lachlan_gray 7 hours ago \| parent [-]
		Robotics/control systems is exactly what came to mind when I saw this release! What struck me is the possibility of look ahead search in real time, a bit like alphazero's mcts.

▲

Kostic 7 hours ago | parent | prev | next [-]

Taalas showed that you could make LLMs faster by turning them into ASICs and get 10k+ token generation. It's a matter of time now.

	▲	timmg 6 hours ago \| parent [-]
		Actually pretty interesting to think: in a few years you might buy a raspberry pi style computer board with an extra chip on it with one of these types of embodiment models and you can slap it in a rover or something.

▲

tootie 6 hours ago | parent | prev | next [-]

Is emulating human behavior really a valuable end goal though? Humans exist as the evolutionary endpoint of exhaustion hunting large pray and organic tool-making. We've built loads of industrial and residential automation tools in the last 100 years and none of them are humanoid. I'd imagine a household robot butler would be more like R2D2 with lots and lots of arms.

▲

hootz 5 hours ago | parent | next [-]

It is when the world was made to interface with us. We can't use robots for everything if they aren't emulating us, because we would have to adapt everything for the non-humanlike robots.

▲

tootie an hour ago | parent [-]

We build our living spaces against the constraints of the human form, but that still doesn't imply the human form is optimal for anything. There's no reason a robot traveling over smooth surface should have legs instead of wheels or treads. There's no reason to have a head. Some kind of arm is a common design feature, but certainly no reason to have two. No reason to be symmetrical. A domestic robot may be constrained in terms of scale (ie see things at counter height) but not shape.

	▲	hgoel 26 minutes ago \| parent [-]
		Really, the requirements are for the robot to move in predictable ways (if something looks like an arm, it ought to move like an arm, etc), and to have enough strength to be useful for difficult/tiring tasks while somehow also not being dangerous if something does go wrong.

▲

Glemllksdf 4 hours ago | parent | prev [-]

Every single behavior? For sure not but otherwise we are the result of a very very long evolution and there is nothing else around us as smart and as adjustable.

The planing ahead thing through simulation for example seems to be a very good tool in neuronal network based architectures.

▲

LetsGetTechnicl 6 hours ago | parent | prev [-]

What if we put slop images into slop machines and got slop^2 back out