Great article. I'll be following along. Would like to learn more about the robotics space.

- I've heard the advantage of ROS besides the architecture is the ecosystem (driver integrations, etc). Is that not an issue because the arm supports a Python SDK OOTB?

- Any issues you've been running into with this setup?

- How do you determine if a session recording is good enough for training? Is 50/100 samples really all you need?

▲

mplappert 2 days ago | parent [-]

Glad you like it!

Re your questions:

- The driver situation turned out totally fine; I intentionally picked HW with good python sdk support so that was very painless.

- The static camera (the C920) is not super great; it drops frames and sometimes cuts out. We’ll see how that goes but it’s probably the clostest thing I want to swap right now. Another issue is reach of the arm when forcing the worst to be axis parallel with the table; you cannot get very far away. The chess setup demo in the video gives an example: I can just reach the row of pawns and beyond that it’s out of reach.

- I don’t know yet! The 50-100 figure comes from the ACT and diffusion policy papers but it depends on the type of task. For fine tuning my sense is that you only need a few hours worth of demos to get good results with pi0.5 etc. a big reason I’m doing this project is that I want to try all of this myself, so the next posts definitely will talk about that

	▲	b89kim a day ago \| parent [-]
		I could confirm 50-100 demonstrations are enough for fine-tuning pi0/pi05. I did research with aloha and humanoid. It works from 20~40ep(5~10min) but success rate would be 70~80%. Pi0 tech paper suggests to use over 1~4 hours of data. I could get 95% success rate for pick&place with 1 hour of humanoid. Anyway, required hours for good SR depend on generality of data. Long Horizon task over 5 min is not working as paper because PI removed high level(subtask) reasoning part in released pi05.