Remix.run Logo
lawlessone 5 days ago

Would be great if this had the kind of money that's being thrown at LLMs.

ACCount36 5 days ago | parent [-]

"If?" This thing has a goddamn LLM at its core.

That's true for most advanced robotics projects those days. Every time you see an advanced robot designed to perform complex real world tasks, you bet your ass there's an LLM in it, used for high level decision-making.

gitremote 4 days ago | parent | next [-]

It's only "ChatGPT-like AI" in that it uses transformers. It's not an LLM. It's not trained on the Internet.

ninetyninenine 5 days ago | parent | prev [-]

No surgery is not token based. It's a different aspect of intelligence.

While technically speaking, the entire universe can be serialized into tokens it's not the most efficient way to tackle every problem. For surgery It's 3D space and manipulating tools and performing actions. It's better suited for standard ML models... for example I don't think Waymo self driving cars use LLMs.

Tadpole9181 5 days ago | parent | next [-]

The AI on display, Surgical Robot Transformer[1], is based on the work of Action Chunking with Transformers[2]. These are both transformer models, which means they are fundamentally token-based. The whitepapers go into more detail on how tokenization occurs (it's not text, like an LLM, they are patches of video/sensor data and sequences of actions).

Why wouldn't you look this up before stating it so confidentally? The link is at the top of this very page.

EDIT: I looked it up because I was curious. For your chosen example, Waymo, they also use (token based) transformer models for their state tracking.[3]

[1]: https://surgical-robot-transformer.github.io/

[2]: https://tonyzhaozh.github.io/aloha/

[3]: https://waymo.com/research/stt-stateful-tracking-with-transf...

ninetyninenine 5 days ago | parent [-]

>Why wouldn't you look this up before stating it so confidentally? The link is at the top of this very page.

hallucinations.

lucubratory 5 days ago | parent | prev [-]

Current Waymos do use the transformer architecture, they're still predicting tokens.