AgileX PiPER: Teleoperation with Hand Gesture Recognition

Hi everyone,
we’d like to share a small project using the AgileX PiPER robotic arm combined with a depth camera to achieve simple teleoperation.
The core idea is to use hand gesture recognition to control the end effector position of PiPER (only Cartesian position, not orientation).
This is a lightweight, low-cost demo that may help inspire gesture-based human-robot interaction projects.


:small_blue_diamond: Abstract

This repository implements a low-cost, simple teleoperation function using a depth camera (position control only).

Tags: Gesture Recognition, Coordinate Mapping, AgileX PiPER

Code Repository: AgileX-College GitHub

Function Demonstration:


:gear: Pre-use Preparation

Hardware

  • Orbbec Petrel (aligned depth + RGB, 640×400@30fps)
  • (Optional) Intel RealSense D435 (aligned depth + RGB, 640×480@30fps)
  • AgileX Robotics PiPER robotic arm

Software Environment

  1. Install required ROS packages:
sudo apt install -y ros-noetic-sensor-msgs ros-noetic-image-transport ros-noetic-cv-bridge ros-noetic-vision-msgs ros-noetic-image-geometry ros-noetic-pcl-conversions ros-noetic-pcl-ros ros-noetic-message-filters
  1. Install gesture recognition library:
pip3 install mediapipe
  1. Clone & build:
cd your_ws/src
git clone https://github.com/agilexrobotics/AgileX-College.git
cd ..
catkin_make
source devel/setup.bash

:play_button: Operation Steps

1. Start the Robotic Arm

Follow the PiPER driver deployment:

Insert CAN module → find bus → activate:

./find_all_can_port.sh
./can_activate.sh

Launch driver:

roslaunch piper start_single_piper.launch

Start IK library (pinocchio):

python piper_pinocchio.py

Start camera (example: D435):

roslaunch realsense2_camera rs_aligned_depth.launch 

Start gesture detection:

rosrun handpose_det handpose_det.py

2. Start Teleoperation

Define custom Home position (different from driver’s built-in Home):

self.endPosX = -0.344
self.endPosY = 0.0
self.endPosZ = 0.110

Publish to /pos_cmd:

rostopic pub /pos_cmd piper_msgs/PosCmd "{
x: -0.344, y: 0.0, z: 0.110,
roll: 0.0, pitch: 0.0, yaw: 0.0,
gripper: 0.0, mode1: 1, mode2: 0
}"

Run detection:

rosrun handpose_det handpose_det.py

RViz shows:

  • Red dots = finger points
  • Green dots = palm center


3. Define Teleoperation Base Frame

  • Gesture OPEN_HAND → change to FIST → hold for 3s
  • Publishes /base_frame_origin pose
  • Maps hand’s relative motion to PiPER motion


:laptop: Full Code

#!/usr/bin/env python3
# _*_ coding:utf-8 _*_

import rospy
import cv2
import mediapipe as mp
import numpy as np
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from visualization_msgs.msg import MarkerArray, Marker
from geometry_msgs.msg import PoseStamped
from piper_msgs.msg import PosCmd

class HandPoseTeleop:
    def __init__(self):
        rospy.init_node('handpose_teleop', anonymous=True)
        self.bridge = CvBridge()
        self.image_sub = rospy.Subscriber("/camera/color/image_raw", Image, self.image_callback)
        self.marker_pub = rospy.Publisher("/hand_markers", MarkerArray, queue_size=1)
        self.base_pub = rospy.Publisher("/base_frame_origin", PoseStamped, queue_size=1)
        self.cmd_pub = rospy.Publisher("/pos_cmd", PosCmd, queue_size=1)

        # Custom home point
        self.endPosX = -0.344
        self.endPosY = 0.0
        self.endPosZ = 0.110

        # Mediapipe
        self.mp_hands = mp.solutions.hands
        self.hands = self.mp_hands.Hands(min_detection_confidence=0.7,
                                         min_tracking_confidence=0.5)
        self.mp_draw = mp.solutions.drawing_utils

        rospy.loginfo("HandPoseTeleop initialized.")

    def image_callback(self, msg):
        cv_image = self.bridge.imgmsg_to_cv2(msg, "bgr8")
        img_rgb = cv2.cvtColor(cv_image, cv2.COLOR_BGR2RGB)
        results = self.hands.process(img_rgb)

        marker_array = MarkerArray()
        if results.multi_hand_landmarks:
            for hand_landmarks in results.multi_hand_landmarks:
                # Publish markers
                for i, lm in enumerate(hand_landmarks.landmark):
                    marker = Marker()
                    marker.header.frame_id = "camera_link"
                    marker.type = Marker.SPHERE
                    marker.action = Marker.ADD
                    marker.scale.x = marker.scale.y = marker.scale.z = 0.01
                    marker.color.a = 1.0
                    marker.color.r = 1.0
                    marker.pose.position.x = lm.x
                    marker.pose.position.y = lm.y
                    marker.pose.position.z = lm.z
                    marker_array.markers.append(marker)

                # Compute palm center (average of landmarks)
                palm_center = np.mean([[lm.x, lm.y, lm.z] for lm in hand_landmarks.landmark], axis=0)
                marker = Marker()
                marker.header.frame_id = "camera_link"
                marker.type = Marker.SPHERE
                marker.action = Marker.ADD
                marker.scale.x = marker.scale.y = marker.scale.z = 0.02
                marker.color.a = 1.0
                marker.color.g = 1.0
                marker.pose.position.x = palm_center[0]
                marker.pose.position.y = palm_center[1]
                marker.pose.position.z = palm_center[2]
                marker_array.markers.append(marker)

                # Send position command
                pos_cmd = PosCmd()
                pos_cmd.x = self.endPosX + palm_center[0]
                pos_cmd.y = self.endPosY + palm_center[1]
                pos_cmd.z = self.endPosZ + palm_center[2]
                pos_cmd.roll = 0.0
                pos_cmd.pitch = 0.0
                pos_cmd.yaw = 0.0
                pos_cmd.gripper = 0.0
                pos_cmd.mode1 = 1
                pos_cmd.mode2 = 0
                self.cmd_pub.publish(pos_cmd)

        self.marker_pub.publish(marker_array)

if __name__ == '__main__':
    try:
        teleop = HandPoseTeleop()
        rospy.spin()
    except rospy.ROSInterruptException:
        pass

:white_check_mark: Summary

This demo shows how the AgileX PiPER robotic arm can be controlled via hand gesture recognition and a depth camera, achieving real-time position teleoperation.

We’d love to hear your feedback! :rocket:

  • How would you extend this (e.g. orientation, multi-modal input)?
  • Anyone interested in integrating this with VR/AR or haptics?

:backhand_index_pointing_right: Full repo & details: AgileX-College on GitHub


AgileX Robotics
:globe_with_meridians: https://global.agilex.ai/

2 Likes

Love the speed of the handling!

1 Like