Remix.run Logo
esafak 5 days ago

https://arxiv.org/abs/2505.10251

https://h-surgical-robot-transformer.github.io/

Approach:

[Our] policy is composed of a high-level language policy and a low-level policy for generating robot trajectories. The high-level policy outputs both a task instruction and a corrective instruction, along with a correction flag. Task instructions describe the primary objective to be executed, while corrective instructions provide fine-grained guidance for recovering from suboptimal states. Examples include "move the left gripper closer to me" or "move the right gripper away from me." The low-level policy takes as input only one of the two instructions, determined by the correction flag. When the flag is set to true, the system uses the corrective instruction; otherwise, it relies on the task instruction.

To support this training framework, we collect two types of demonstrations. The first consists of standard demonstrations captured during normal task execution. The second consists of corrective demonstrations, in which the data collector intentionally places the robot in failure states, such as missing a grasp or misaligning the grippers, and then demonstrates how to recover and complete the task successfully. These two types of data are organized into separate folders: one for regular demonstrations and another for recovery demonstrations. During training, the correction flag is set to false when using regular data and true when using recovery data, allowing the policy to learn context-appropriate behaviors based on the state of the system.