Optimization of Piper Robotic Arm Motion Control via lerobot Transplantation

Optimization of Piper Robotic Arm Motion Control via lerobot Transplantation

Author: VA11Hall
Link: https://zhuanlan.zhihu.com/p/1946636125415401016
Source: Zhihu

I. Introduction

We have successfully transplanted lerobot to the Piper robotic arm, enabling smooth execution of task workflows including remote control, data collection, training, and inference. The current goal is to optimize the Piper’s operational performance—such as improving success rates and motion stability. Key optimization measures will focus on two aspects: enhancing the quality and scale of datasets, and further refining motion control algorithms. For the former, we plan to conduct experiments on reducing light interference, deploying cameras in more reasonable positions (e.g., on the arm itself), and improving the consistency of teaching actions during data collection. For the latter, we will directly modify the code to enhance motion control.

This article introduces a code-based approach to optimize Piper’s motion control, inspired by the following Bilibili video:LeRobot ACT Algorithm Introduction and Tuning

The video author not only provides optimization ideas and demonstration of results but also shares the source code. This article analyzes and explains the ideas and corresponding code implementations from the video, and presents the results of transplanting this code to the Piper robotic arm for practical testing.

II. Limitations of Motion Control in lerobot’s Official Code

Robots trained with lerobot often exhibit severe jitter during inference and validation. This is because lerobot relies on imitation learning—during data collection, human demonstrators inevitably introduce unnecessary jitter into the dataset due to unfamiliarity with the master arm. Additionally, even for similar grasping tasks, demonstrators may adopt different action strategies. Given the current limitations of small dataset sizes and immature network architectures, these factors lead to unstable motion control (there are also numerous other contributing factors).

For a given pre-trained model, developers can directly improve data collection quality to provide the model with high-quality task demonstrations—analogous to “compensating for a less capable student with a more competent teacher.” Furthermore, developers can embed critical knowledge that the robot struggles to learn into the code through explicit programming.

To reduce jitter during robotic arm movement without compromising the model’s generalization ability, two classic motion control optimization strategies can be adopted: motion filtering and interpolation.

III. Interpolation and Filtering of Action Sequences Generated by ACT

The default model used in lerobot workflows is ACT, with relevant code located in the policies directory. The lerobot project has transplanted the original ACT code and implemented wrapper functions for robot control.

Using VS Code’s indexing feature, we can directly locate the select_action function in lerobot’s ACT-related code:

python

def select_action(self, batch: dict[str, Tensor]) -> Tensor:
    """Select a single action given environment observations.

    This method wraps `select_actions` in order to return one action at a time for execution in the
    environment. It works by managing the actions in a queue and only calling `select_actions` when the
    queue is empty.
    """
    self.eval()  # keeping the policy in eval mode as it could be set to train mode while queue is consumed

    if self.config.temporal_ensemble_coeff is not None:
        actions = self.predict_action_chunk(batch)
        action = self.temporal_ensembler.update(actions)
        return action

    # Action queue logic for n_action_steps > 1. When the action_queue is depleted, populate it by
    # querying the policy.
    if len(self._action_queue) == 0:
        actions = self.predict_action_chunk(batch)[:, : self.config.n_action_steps]

        # `self.model.forward` returns a (batch_size, n_action_steps, action_dim) tensor, but the queue
        # effectively has shape (n_action_steps, batch_size, *), hence the transpose.
        self._action_queue.extend(actions.transpose(0, 1))
    return self._action_queue.popleft()

The core logic here is: if the action queue is empty, the model predicts and generates a new sequence of actions. A unavoidable limitation of this logic is that the end of one action cluster (a sequence of consecutive actions) and the start of the next generated cluster often lack continuity. This causes the robotic arm to exhibit sudden jumps during inference (more severe than jitter, similar to convulsions).

To address this, linear interpolation can be used to generate a series of intermediate actions, smoothing the transition between discontinuous action clusters. Subsequently, applying mean filtering to the entire action sequence can further mitigate jitter.

P.S.: While writing this, I suddenly wondered if slower demonstration actions during data collection would result in more stable operation.

Based on the above ideas, the select_action function was modified as follows:

python

def select_action(self, batch: dict[str, Tensor]) -> Tensor:
    """Select a single action given environment observations.

    This method wraps `select_actions` in order to return one action at a time for execution in the
    environment. It works by managing the actions in a queue and only calling `select_actions` when the
    queue is empty.
    """
    self.eval()  # keeping the policy in eval mode as it could be set to train mode while queue is consumed

    if self.config.temporal_ensemble_coeff is not None:
        actions = self.predict_action_chunk(batch)
        action = self.temporal_ensembler.update(actions)
        return action

    # vkrobot: Model prediction generates a sequence of n_action_steps, which is stored in the queue.
    # The robotic arm is controlled based on the actions in the sequence.
    if len(self._action_queue) == 1:
        self.last_action = self._action_queue[0].cpu().tolist()[0]

    # Action queue logic for n_action_steps > 1. When the action_queue is depleted, populate it by
    # querying the policy.
    if len(self._action_queue) == 0:
        actions = self.predict_action_chunk(batch)[:, : self.config.n_action_steps]

        # `self.model.forward` returns a (batch_size, n_action_steps, action_dim) tensor, but the queue
        # effectively has shape (n_action_steps, batch_size, *), hence the transpose.
        # vkrobot: Linear interpolation for jump points
        self.begin_mutation_filter(actions)
        self._action_queue.extend(actions.transpose(0, 1))
        # vkrobot: Mean filtering
        self.actions_mean_filtering()
    return self._action_queue.popleft()

Key modifications include:

python

if len(self._action_queue) == 1:

When only one action remains in the queue (indicating the end of the previously predicted action cluster), this action is recorded. For clarification: an “action” here refers to a set of joint angles.

Thus, when generating the next prediction, linear interpolation can be used to smooth the transition from the last action of the previous cluster to the first action of the new cluster. Additionally, mean filtering is applied to all newly generated action sequences:

python

self.begin_mutation_filter(actions)
self._action_queue.extend(actions.transpose(0, 1))
# vkrobot: Mean filtering
self.actions_mean_filtering()

The interpolation and filtering functions need to be implemented separately, as they are not included in the original lerobot code.

IV. Adding Smooth Loss to the Loss Function

The video author also proposes another method to reduce jitter: incorporating smooth loss into the total loss function. This is a common technique in machine learning—an ingenious idea, though its practical effectiveness may vary depending on the scenario.

python

# # # Mean filtering loss vkrobot
kernel_size = 11
padding = kernel_size // 2
x = actions_hat.transpose(1, 2)
weight = torch.ones(6, 1, kernel_size, device=actions_hat.device) / kernel_size
filtered_x = F.conv1d(x, weight, padding=padding, groups=6)
filtered_tensor = filtered_x.transpose(1, 2)
mean_loss = torch.abs(actions_hat - filtered_tensor).mean()
loss += mean_loss
loss_dict["mean_loss"] = mean_loss.item()

V. Other Optimization Attempts

The video also mentions modifying model inference parameters to improve grasping success rates. We tested this method on the Piper: setting the model to infer 100 steps and execute the first 50 steps resulted in the robot entering a hesitant state, failing to proceed. Adjusting to 70 steps also led to similar issues. Thus, parameter modification may require scenario-specific tuning.

Additionally, the video suggests introducing mean filtering during data collection—a method that should be effective. We plan to test this in future research focused on data collection optimization.

After integrating interpolation and filtering, we ran the previously trained model. A comparison of the operational performance before and after optimization can be viewed in the following video:Piper lerobot Transplantation: Motion Control Optimization Demo

Overall, the Piper robotic arm’s motion during inference has become significantly smoother, with a moderate improvement in grasping success rates.