Retargeting Matters: General Motion Retargeting for Humanoid Motion Tracking
TL;DR Summary
This work proposes General Motion Retargeting (GMR) to reduce artifacts in humanoid motion tracking. GMR improves retargeting quality and policy robustness without extensive reward tuning, outperforming existing open-source methods in fidelity and success rate.
Abstract
Humanoid motion tracking policies are central to building teleoperation pipelines and hierarchical controllers, yet they face a fundamental challenge: the embodiment gap between humans and humanoid robots. Current approaches address this gap by retargeting human motion data to humanoid embodiments and then training reinforcement learning (RL) policies to imitate these reference trajectories. However, artifacts introduced during retargeting, such as foot sliding, self-penetration, and physically infeasible motion are often left in the reference trajectories for the RL policy to correct. While prior work has demonstrated motion tracking abilities, they often require extensive reward engineering and domain randomization to succeed. In this paper, we systematically evaluate how retargeting quality affects policy performance when excessive reward tuning is suppressed. To address issues that we identify with existing retargeting methods, we propose a new retargeting method, General Motion Retargeting (GMR). We evaluate GMR alongside two open-source retargeters, PHC and ProtoMotions, as well as with a high-quality closed-source dataset from Unitree. Using BeyondMimic for policy training, we isolate retargeting effects without reward tuning. Our experiments on a diverse subset of the LAFAN1 dataset reveal that while most motions can be tracked, artifacts in retargeted data significantly reduce policy robustness, particularly for dynamic or long sequences. GMR consistently outperforms existing open-source methods in both tracking performance and faithfulness to the source motion, achieving perceptual fidelity and policy success rates close to the closed-source baseline. Website: https://jaraujo98.github.io/retargeting_matters. Code: https://github.com/YanjieZe/GMR.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Retargeting Matters: General Motion Retargeting for Humanoid Motion Tracking
1.2. Authors
Joo Pedro Araújo†, Yanjie Ze†, Pei Xu†, Jiajun Wu*, C. Karen Liu*. Affiliation: Stanford University.
1.3. Journal/Conference
This paper is published as a preprint on arXiv (arXiv:2510.02252v1). While not yet peer-reviewed for a specific journal or conference, its publication venue, Stanford University, is a highly reputable institution in computer science, robotics, and artificial intelligence, suggesting a strong academic backing and potential for significant impact in the field.
1.4. Publication Year
2025
1.5. Abstract
This paper addresses the embodiment gap between humans and humanoid robots in motion tracking for teleoperation and hierarchical control. Current approaches retarget human motion data to robots, then train Reinforcement Learning (RL) policies. However, artifacts like foot sliding, self-penetration, and physically infeasible motion often remain in these retargeted reference trajectories, forcing RL policies to correct them, often requiring extensive reward engineering and domain randomization. The authors systematically evaluate how retargeting quality impacts policy performance when excessive reward tuning is suppressed. To mitigate issues in existing methods, they propose General Motion Retargeting (GMR). They compare GMR with two open-source retargeters (PHC and ProtoMotions) and a high-quality closed-source dataset from Unitree. Using BeyondMimic for policy training, they isolate retargeting effects without reward tuning. Experiments on a diverse subset of the LAFAN1 dataset show that while most motions can be tracked, artifacts significantly reduce policy robustness, especially for dynamic or long sequences. GMR consistently outperforms open-source methods in tracking performance and faithfulness to source motion, achieving perceptual fidelity and policy success rates comparable to the closed-source baseline.
1.6. Original Source Link
https://arxiv.org/abs/2510.02252v1 (Preprint)
1.7. PDF Link
https://arxiv.org/pdf/2510.02252v1.pdf
2. Executive Summary
2.1. Background & Motivation
The paper tackles a fundamental challenge in humanoid robotics: building effective teleoperation pipelines and hierarchical controllers that rely on humanoid motion tracking. The core problem lies in the embodiment gap—significant morphological, kinematic, and dynamic differences—between humans and humanoid robots. Existing methods address this by kinematically retargeting human motion data to robotic embodiments, then training Reinforcement Learning (RL) policies to imitate these reference trajectories.
The primary motivation for this research stems from a critical oversight in current practices: the presence of artifacts in retargeted data. These artifacts, such as foot sliding, ground penetration, self-penetration, and physically infeasible motion, are often passed directly to RL policies. While prior work has shown that policies can be trained on such data, it typically demands extensive reward engineering and domain randomization to compensate for these imperfections. The authors hypothesize that without these extensive engineering efforts, the quality of the retargeted motion plays a much more significant role in policy performance and robustness. There is a clear gap in understanding the direct impact of retargeting quality on RL policy learning when reward tuning is suppressed.
2.2. Main Contributions / Findings
The paper makes several key contributions:
- Systematic Evaluation of Retargeting Quality: It conducts a rigorous and systematic evaluation of how
retargeting qualityaffectsRL policy performancein motion tracking tasks, specifically whenexcessive reward tuninganddomain randomizationare suppressed. This isolates the impact of the retargeting process itself. - Proposal of General Motion Retargeting (GMR): The paper introduces a new retargeting method,
GMR, designed to address the specific shortcomings (deviation from source motion,foot sliding,ground penetrations,self-intersections) identified in existing open-source retargeters likePHCandProtoMotions. GMR employs a novel non-uniform local scaling procedure followed by a two-stage optimization. - Identification of Critical Retargeting Artifacts: The research identifies specific types of artifacts (
physically inconsistent height,self-intersections, andsudden jumps in joint values) that significantly reducepolicy robustnessand can make certain motions unlearnable without substantialreward engineering. - Demonstration of GMR's Superior Performance: Through extensive experiments on a diverse subset of the
LAFAN1 dataset, GMR consistently outperforms existing open-source methods in bothtracking performance(lower errors) andfaithfulnessto the source motion (confirmed by a user study). GMR achievesperceptual fidelityandpolicy success ratesthat are close to a high-quality, closed-source baseline dataset from Unitree. - Emphasis on Initial Frame Stability: The paper highlights the importance of the
initial frameof thereference motionfor policy success, recommending that both start and end poses be stable for safe policy deployment.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully grasp the contributions of this paper, a foundational understanding of several key concepts is necessary, particularly in robotics, computer graphics, and machine learning.
-
Humanoid Motion Tracking: This refers to the task of enabling a humanoid robot to replicate or imitate a given human motion. The goal is for the robot's movements (e.g., walking, running, dancing) to closely match those of a human, considering the robot's physical constraints. This is crucial for applications like
teleoperation(controlling a robot remotely) andhierarchical control(breaking down complex tasks into simpler, manageable sub-tasks). -
Embodiment Gap: This term describes the inherent differences between human bodies and humanoid robot bodies. These differences include variations in
bone length,joint range of motion,kinematic structure(how joints are connected),body shape,mass distribution, andactuation mechanisms(how joints are moved). Overcoming this gap is central to successfully transferring human motions to robots. -
Kinematic Retargeting: This is a data processing technique used to adapt a motion from a source character (e.g., a human) to a target character (e.g., a humanoid robot) that may have a different
skeletonormorphology. The process involves mapping the joint positions, orientations, or end-effector trajectories of the source to the target, while respecting the target's physical constraints and maintaining the perceptual characteristics of the original motion. -
Reinforcement Learning (RL): A paradigm of machine learning where an
agentlearns to make decisions by performing actions in anenvironmentto maximize a cumulativerewardsignal.- Policy: The
policyin RL is the agent's strategy, mapping observed states of the environment to actions to be taken. In motion tracking, the policy learns to control the robot's joints to follow areference motion. - Reward Engineering (or Reward Shaping): The process of designing the
reward functionthat guides an RL agent's learning. Well-designed rewards are crucial for successful learning, but complex tasks often require extensive, hand-tuned reward functions. - Domain Randomization: A technique used in RL to train policies in simulation that generalize better to the real world. It involves randomizing various physical parameters (e.g., friction, mass, sensor noise) within the simulation during training, making the policy robust to variations and uncertainties in the target environment.
- Policy: The
-
SMPL (Skinned Multi-Person Linear Model): A widely used statistical 3D
human body modelin computer graphics. It represents human body shapes and poses using a low-dimensional parameter space.- It takes
shape parameters() andpose parameters() as input. - It outputs the 3D locations of vertices of a posed human body
mesh(). - A
joint regressor() is then used to derive 3Djoint positions() from the mesh vertices.
- It takes
-
Inverse Kinematics (IK): In robotics and computer graphics,
IKis the mathematical process of calculating the joint parameters (e.g., angles) of an articulated body (like a robot arm or human skeleton) that will achieve a desired position and orientation for a specifiedend-effector(e.g., a hand or foot). It's the inverse problem offorward kinematics. -
Forward Kinematics (FK): The calculation of the position and orientation of the
end-effectorfrom the given joint parameters (angles) and link lengths of an articulated body. -
LAFAN1 Dataset: A publicly available motion capture dataset consisting of a variety of human locomotion and expressive motions. It's often used as a source for
reference motionsin character animation and robotics research. -
Unitree G1 Robot: A specific model of a humanoid robot, often used as a target embodiment in research for its dynamic capabilities.
3.2. Previous Works
The paper contextualizes its work within a broad landscape of motion retargeting and humanoid control research.
-
Motion Retargeting in Computer Graphics:
- Classic Methods [16, 17, 18, 19]: These often employ
optimization-based approachesand rely onheuristically defined kinematic constraintsto map motions. They are typically concerned with visual fidelity for character animation. - Data-driven Approaches [20, 21, 22, 23, 24, 25]: With the rise of deep learning, methods requiring
paired datafor supervised learning,semantic labelsfor unsupervised training, orlanguage modelsfor visual evaluation have emerged. However, the paper notes that the difficulty of acquiring paired data for real robots limits their direct application in humanoid control.
- Classic Methods [16, 17, 18, 19]: These often employ
-
Motion Retargeting in Robotics:
- Naïve Approaches [3, 5]: These directly copy
joint rotationsfrom human to robot. This often leads to severeartifactslikefloating,foot penetrations,sliding, andend-effector driftdue to topological and morphological differences. Additional processing is needed to convert joint spaces (common in humans) torevolute joints(common in robots). - Whole-Body Geometric Retargeting (WBGR) [35, 1]: These approaches use
Inverse Kinematics (IK)to addressjoint space misalignment.- Vanilla WBGR: Ignores size differences and matches orientations of key links.
- HumanMimic [36]: Solves
IKforCartesian position matchingof key points, using manually defined scaling coefficients.
- SMPL-based Retargeting (e.g., PHC [2, 14, 11, 38]): These methods leverage the
SMPLbody model.PHCfits the robot shape to anSMPLmodel, then calculates target 3D joint coordinates using the human pose parameters. An optimization then solves for robot root translation, orientation, and joint angles to minimize theposition errorbetween the posed robot and theSMPL-derived targets.- Critique:
PHCusesgradient descentandforward kinematics, which can betime-consuming. Crucially, itdoes not take into account contact statesduring retargeting, leading tofloating,foot sliding, andpenetrations.SMPLis also designed for human bodies and may not represent robots with large morphological discrepancies well.
- Critique:
- Differential IK Solvers (e.g., ProtoMotions [40], KungfuBot [8]): These methods scale Cartesian joint positions of the source motion and then calculate
generalized velocitiesthat, when integrated, reduce position and orientation errors.- ProtoMotions [40]: Uses
global axis-aligned scaling factorsandMink [41](a differential IK solver) to minimize a weighted sum of position and orientation errors between key bodies. - KungfuBot [8]: Uses the
ProtoMotionsapproach but with scaling disabled.
- ProtoMotions [40]: Uses
- Learning-based Retargeting for Humanoids [32, 33, 34]: Some works explore learning-based methods but often focus on simpler arm/upper-body motions or lack data for whole-body tasks.
- Naïve Approaches [3, 5]: These directly copy
-
Humanoid Control with RL:
- Many recent works [2, 3, 4, 5, 7, 8, 6, 10, 15] use
RL-based approachesto learn policies for humanoid control byimitating reference motions. - BeyondMimic [15]: This is the
policy training pipelineused in the current paper for evaluation. It's highlighted as a method that works for various reference motions without extensivereward tuningordomain randomization, making it suitable for isolatingretargeting effects. - Other influential works:
VideoMimic [10](visual imitation),Hub [7](extreme balance),ASAP [11](sim-to-real for agile skills).
- Many recent works [2, 3, 4, 5, 7, 8, 6, 10, 15] use
3.3. Technological Evolution
The field of humanoid motion control has evolved from simple direct mapping and heuristic-based retargeting to sophisticated optimization-based and data-driven approaches. Early methods struggled with the embodiment gap, leading to unrealistic motion artifacts. The introduction of parametric body models like SMPL significantly improved the ability to represent human motion and apply it to robots, albeit with remaining challenges in contact consistency and physical feasibility.
The advent of Reinforcement Learning has enabled robots to learn complex, dynamic skills from these retargeted motions. However, this often came at the cost of extensive reward engineering and domain randomization to make policies robust enough to handle the artifacts inherent in retargeted data.
This paper represents a crucial step in this evolution by shifting focus back to the quality of the retargeting process itself. Instead of relying solely on RL to correct for poor retargeting, it seeks to improve the source data, thereby simplifying the RL training task and leading to more robust policies. GMR contributes to this by providing a more robust retargeting solution that explicitly addresses scaling and kinematic constraints to minimize artifacts.
3.4. Differentiation Analysis
Compared to the main methods in related work, the core differences and innovations of the GMR approach are:
-
Addressing Scaling Artifacts:
GMR's primary differentiation lies in itsnon-uniform local scaling procedure(Step 3). UnlikePHCwhich relies onSMPLfor a global scaling that can introduce distortions when fitting non-humanoid robots, orProtoMotionswhich usesglobal axis-aligned scaling factorsthat might not account for local body part proportions,GMRallows for custom local scale factors for different key bodies. This is crucial for accurately translating human proportions to a robot without introducingfoot slidingorself-penetration. The emphasis onuniform scaling factor for root translationis a key insight for avoidingfoot sliding. -
Two-Stage Optimization for Robustness:
GMRemploys atwo-stage optimizationprocess (Steps 4 and 5) to solve theIK problem. The first stage focuses onend-effectorsandbody orientationsto get a good initial guess, while the second stage fine- tunes by considering the positions ofall key bodies. This sequential approach helps avoidlocal optimization minimaand provides a more stable solution compared to single-stage optimizations.PHCusesgradient descentover all frames, which can betime-consumingand potentially less robust to local minima for complex poses.ProtoMotionsusesdifferential IKbut doesn't explicitly mention a two-stage approach for initial setup and fine-tuning thatGMRdoes. -
Explicit Artifact Mitigation:
GMRis explicitly designed to fixglaring artifactslikedeviation from source motion,foot sliding,ground penetrations, andself-intersections, which are often a consequence of the scaling and IK methods used inPHCandProtoMotions. The paper directly evaluates and highlights these artifacts in the competitor methods. -
Independent Policy Training Environment: The use of
BeyondMimicforpolicy trainingis a methodological strength. SinceBeyondMimicis developed independently and does not requirereward tuningorextensive domain randomization, it provides a fair and unbiased platform to evaluate the direct impact of retargeting quality, which is often masked byRL engineeringefforts in other works.In summary,
GMRdifferentiates itself by providing a more nuanced and robustkinematic retargetingsolution that prioritizes artifact prevention through intelligent scaling and multi-stage optimization, leading to higher qualityreference motionsthat are easier forRL policiesto learn from, even without heavyreward engineering.
4. Methodology
The core of this paper lies in its proposed General Motion Retargeting (GMR) pipeline, designed to produce high-quality, artifact-free retargeted motions for humanoid robots. The methodology section also details how existing retargeting methods are applied and how policies are trained and evaluated.
4.1. Principles
The fundamental principle behind GMR is to generate reference motions that are as physically feasible and perceptually faithful as possible, thereby simplifying the task for Reinforcement Learning (RL) policies. The authors identify that many artifacts in retargeted motions (like foot sliding, ground penetration, self-intersection, physically infeasible motion) stem from inadequate handling of scaling and kinematic constraints. GMR addresses this through two main strategies:
- Non-Uniform Local Scaling: Instead of global scaling, GMR employs a flexible, non-uniform scaling procedure that accounts for specific differences in body part proportions between human and robot, especially for the root translation, to maintain contact and avoid large-scale distortions.
- Two-Stage Optimization for IK: It uses a robust, two-stage
Inverse Kinematics (IK)optimization approach that first focuses on achieving crucialend-effectorandorientationtargets to provide a stable initial guess, then fine-tunes the entirerobot posewith more comprehensivepositionandrotation constraints. This helps in avoidinglocal minimaand producing more stable and accurate poses.
4.2. Core Methodology In-depth (Layer by Layer)
The General Motion Retargeting (GMR) pipeline consists of five sequential steps to transform source human motion into a target robot motion.
4.2.1. Step 1: Human-Robot Key Body Matching
The initial step involves establishing a correspondence between the relevant body parts (key bodies) of the source human skeleton and the target humanoid skeleton. This mapping, denoted as , is user-defined and typically includes torso, head, legs, feet, arms, and hands. This mapping is crucial as it informs the subsequent Inverse Kinematics (IK) optimization problems, specifying which human body parts should be tracked by which robot body parts. The user can also assign weights for the position and orientation tracking errors of these matched key bodies, allowing for prioritization (e.g., feet and hands might have higher weights).
4.2.2. Step 2: Human-Robot Cartesian Space Rest Pose Alignment
This step aims to align the rest poses of the human and robot in Cartesian space before motion application. The orientations of the human bodies are offset to match the orientations of the corresponding robot bodies when both are in their respective rest poses. In some cases, a local offset to the position of a body might also be added. This pre-alignment helps to mitigate artifacts that arise from initial pose discrepancies, such as the toed-in artifact mentioned in prior work [2], ensuring a more natural starting configuration for the retargeted motion.
4.2.3. Step 3: Human Data Non-Uniform Local Scaling
This step is identified as critical for avoiding many artifacts found in other retargeting methods. The GMR approach to scaling is unique in its flexibility and local specificity.
First, a general scaling factor is calculated based on the height () of the source human skeleton. This general factor is then used to adjust a set of custom local scale factors () defined for each key body . This allows for non-uniform scaling, accommodating the fact that the human upper body might scale differently from the lower body when mapped to a robot.
The target body positions in Cartesian space () are computed using the following formula:
$
\mathbf{p}b^{\mathrm{target}} = \frac{h}{h{\mathrm{ref}}} s_b (\mathbf{p}j^{\mathrm{source}} - \mathbf{p}{\mathrm{root}}^{\mathrm{source}}) + \frac{h}{h_{\mathrm{ref}}} s_{\mathrm{root}} \mathbf{p}_{\mathrm{root}}^{\mathrm{source}}
$
Where:
-
is the target Cartesian position for body on the robot.
-
is the current height of the human source skeleton.
-
is a reference height used when defining the scaling factors (a baseline height relative to which are set).
-
is the custom local scaling factor for body .
-
is the Cartesian position of the human joint corresponding to body .
-
is the Cartesian position of the human root joint.
-
is the scaling factor applied specifically to the root position.
When the body in question is the
root, the scaling equation simplifies to: $ \mathbf{p}{\mathrm{root}}^{\mathrm{target}} = \frac{h}{h{\mathrm{ref}}} s_{\mathrm{root}} \mathbf{p}_{\mathrm{root}}^{\mathrm{source}} $ The authors emphasize thatscaling the root translation by a uniform scaling factor is crucial to avoid introducing foot sliding artifacts. This means that the root's horizontal movement should be scaled uniformly, preserving its relative motion pattern.
4.2.4. Step 4: Solving Robot IK with Rotation Constraints
This is the first stage of a two-stage optimization process to find the robot's generalized coordinates (, which include root translation, root rotation, and joint values). This stage is designed to avoid local optimization minima by focusing on critical aspects first. Given a target pose, the following optimization problem is solved:
$
\begin{array}{r l} \operatorname*{min}{\mathbf{q}} & \sum{(i,j) \in \mathcal{M}} (w_1){i,j}^R | R_i^h \ominus R_j(\mathbf{q}) |2^2 \ & + \sum{(i,j) \in \mathcal{M}{\mathrm{ee}}} (w_1)_{i,j}^p | \mathbf{p}_i^{\mathrm{target}} - \mathbf{p}_j(\mathbf{q}) |_2^2 \ \mathrm{subject~to} & \mathbf{q}^- \leq \mathbf{q} \leq \mathbf{q}^+ \end{array}
$
Where:
-
represents the robot's generalized coordinates (root translation, root rotation, and joint angles).
-
is the orientation of human body . represents the Special Orthogonal Group of 3x3 rotation matrices, which describes 3D rotations.
-
and are the Cartesian position and orientation of robot body , respectively, computed through
forward kinematicsgiven . -
is the
exponential map representationof theorientation differencebetween and . Specifically, in . Theexponential mapconverts a rotation matrix difference into a 3-vector in the Lie algebra , effectively representing the magnitude and axis of the rotational error. The then measures the squared Euclidean norm of this error. -
is the full set of human-robot key body matches defined in Step 1.
-
is a subset of containing only the
end-effectors(hands and feet), which are often prioritized for accurate tracking. -
and are the
weightsfor the position and rotation errors, respectively, in this first optimization stage. These weights allow for prioritizing certain body parts or types of errors. -
and are the
minimumandmaximum joint limitsof the robot, enforcing physical constraints.The
root positionandorientation componentsof are initialized using thescaled position(from Step 3) and theyaw componentof thehuman root key body orientation.
This problem is solved using Mink [41], a differential IK solver. Instead of directly finding , Mink computes generalized velocities that, when integrated, reduce the cost function. The optimization solved by Mink is:
$
\begin{array}{r l} \operatorname*{min}_{\dot{\mathbf{q}}} & {} | e(\mathbf{q}) + J(\mathbf{q}) \dot{\mathbf{q}} |_W^2 \ \mathrm{subject~to~} & {} \mathbf{q}^- \le \mathbf{q} + \dot{\mathbf{q}} \Delta t \le \mathbf{q}^+ \end{array}
$
Where:
-
is the
loss functionfrom the previous equation (Eq. 4 in the paper), representing the vector of errors to be minimized. -
is the
Jacobian matrixof the loss function with respect to . The Jacobian describes how the errors change with respect to infinitesimal changes in the generalized coordinates. -
is a
weight matrixinduced by the individual weights and , controlling the relative importance of different error components. -
is a parameter specific to the
differential IK solverand does not necessarily correspond to the actual time difference between motion frames. It controls the integration step size.The solver runs until
convergence(change in value function below a threshold, set to 0.001) or amaximum number of iterations(10) is reached.
4.2.5. Step 5: Fine Tuning using Rotation & Translation Constraints
The solution obtained from Step 4 is used as the initial guess for this final optimization stage. This stage aims to fine-tune the robot's pose by considering position and orientation constraints for all key bodies, not just end-effectors, with a potentially different set of weights. The optimization problem is:
$
\begin{array}{r l} \operatorname*{min}{\mathbf{q}} & \sum{(i,j) \in \mathcal{M}} (w_2)_{i,j}^R | R_i^h \ominus R_j(\mathbf{q}) |2^2 \ & + (w_2){i,j}^p | \mathbf{p}_i^{\mathrm{target}} - \mathbf{p}_j(\mathbf{q}^r) |_2^2 \ \mathrm{subject~to} & \mathbf{q}^- \leq \mathbf{q} \leq \mathbf{q}^+ \end{array}
$
Where:
-
The notation is similar to Step 4, but and are a
different set of weightsfrom the first stage. This allows for a more detailed and holisticconstraint satisfactionin the fine-tuning phase. -
The term implicitly refers to the robot body positions obtained through
forward kinematicsfrom the generalized coordinates . The notation might be a typo and should likely be to be consistent with the argument of .The
termination conditions(convergence or max iterations) are the same as in Step 4.
4.2.6. Application to Motion Sequences
The described GMR method is applied sequentially to each frame of a motion sequence. The retarget result from the previous frame is used as the initial guess for the optimization in Step 4 of the current frame. This temporal coherence helps ensure smooth and continuous motion. After a full motion has been retargeted, a post-processing step is performed: forward kinematics is used to compute the height of all robot bodies over time. The minimum height across all bodies and frames is then subtracted from the global translation to correct for any residual height artifacts (e.g., floating or ground penetration).
4.3. Retargeting Methods for Comparison
The paper evaluates GMR against several other retargeting methods:
-
PHC [38, 2, 14, 11]: This method is designed for motions in
SMPLformat. It first fits anSMPLshape to the robot skeleton. Then, for each frame, it uses the humanSMPL pose parametersto calculate target 3D joint coordinates for the robot. Anoptimization(usinggradient descentwithAdamfor root pose andAdadeltafor joint angles) minimizes the position error between theforward kinematicsof the robot and theseSMPL-derived targets.Joint limitsare enforced by clamping. A post-processing step adjusts the root translation based on the lowest body height. The authors note PHC's issues withcontact statesandcomputational time. -
ProtoMotions [40]: This package includes an
optimization-based retargeting algorithm. It scales the source motion usingcustom scaling factors for each world frame axis. It then employsMink[41] (a differential IK solver) to minimizejoint position and orientation errorsbetween the scaled human and robotkey bodies. It also includes a post-processing step to set the lowest height to match the lowest height in the source motion. -
Unitree (Closed-Source): This refers to a high-quality, pre-retargeted dataset from Unitree, likely generated by a proprietary method not publicly available. It serves as a
high-quality baselineto gauge the performance of open-source methods.
4.4. Data Processing
- Source Data: A
diverse samplefrom theLAFAN1 dataset[42] is chosen, including simple, dynamic, and complex motions (e.g., walking, dancing, martial arts). Motions with complex environmental interactions (e.g., crawling) are excluded, except for acartwheelwhere feet/hands are always in contact. - Target Robot:
Unitree G1 robot. - Format Conversion:
LAFAN1data is inBVH format.GMRis directly compatible withBVH.PHCandProtoMotionsrequireSMPLorSMPL-Xformat. The authors convertBVHtoSMPL(-X)by:- Fitting
SMPL(-X)shape parameters() to theBVH skeletonby minimizing joint position error (with regularization to prevent distortions). - Copying matching
joint 3D rotationsfrom theLAFAN1 skeleton(which has a similar kinematic structure toSMPL(-X)). - Calculating
root translationas the offset that minimizes position error between the posedLAFAN1 skeletonand the posedSMPL(-X) skeleton.
- Fitting
SMPL-X[43] is preferred overSMPLforProtoMotionsdue to better fit toLAFAN1.
- Post-processing for PHC: The authors found
PHC's built-in foot penetration fix sometimes led to severe floating. They applied a custom fix:forward kinematicsto getminimum body heightper frame, then offsetting the entire motion by themean minimum body height. Other methods didn't require this.
4.5. Motion Tracking Evaluation
- Policy Training:
BeyondMimic[15] is used to train individualmotion imitation policiesfor each retargeted motion. This choice is deliberate becauseBeyondMimicis designed to work withoutreward tuningorextensive domain randomization, making it ideal for isolating the impact ofretargeting quality. Training is performed inIsaacSim. - Robustness Evaluation: Policies are evaluated under various conditions to test their robustness:
sim: 100 rollouts inIsaacSimwithoutdomain randomization.sim-dr: 4096 rollouts inIsaacSimwithdomain randomizationenabled (simulating noisy sensors, state estimation errors, model parameter errors).sim2sim: 100 rollouts using aROSnode runningMuJoCo, mimicking a real-world deployment scenario. This setup introducestiming and synchronization conditionsfromROSandnoisefrom astate estimation module, but withoutsimulator parameter tuningorprivileged information.
- Rollout Termination: Each rollout continues until the robot
falls(anchor body height or orientation deviates beyond a threshold) or thereference motion ends.
4.6. User Study Evaluation
To assess perceptual faithfulness (Q3), a user study is conducted.
- Methodology: Participants are shown a 5-second clip of a
reference motion(rendered fromSMPL-Xfit ofLAFAN1data) and tworetargeted clips. One retargeted clip is alwaysGMR, and the other is eitherUnitree,PHC, orProtoMotions. - Blinding & Randomization: Users are not told which method produced which video, and the order of presentation is randomized.
- Task: Users select which of the two retargeted videos they believe is
closer to the reference motion, with an option for "can't find a difference." - Coverage: 15 motions are randomly sampled. Each of the 3 competitor methods is compared against
GMRfor every motion, resulting in 45 questions per user. - Participants: 20 users.
5. Experimental Setup
5.1. Datasets
The primary dataset used for source human motion is the LAFAN1 dataset [42].
-
Source:
LAFAN1is a motion capture dataset. -
Scale & Characteristics: The authors selected a diverse subset of 21 sequences from
LAFAN1. These motions vary significantly inlength(from 5 seconds to 2 minutes) anddifficulty, including:- Simple locomotion:
walking,turning. - Dynamic and complex motions:
martial arts,kicks,dancing,running,hopping,jumping.
- Simple locomotion:
-
Domain: Human motions.
-
Format:
BVHformat. -
Exclusions: Motions with complex interaction with the environment (e.g.,
crawling,getting up from the floor) are generally excluded. An exception is made for acartwheelmotion, as the robot maintains contact with either feet or hands throughout. -
Target Robot: All motions are retargeted to a
Unitree G1 robot. This specific humanoid robot serves as the target embodiment for all experiments.An example of data samples would be the diverse motions themselves, such as a
walk,run,dance,jump, orkicksequence, as shown qualitatively in the user study interface (Fig. 1 of the original paper). Each sequence comprises a series of poses over time for the human skeleton.
These datasets were chosen because LAFAN1 provides a rich variety of human movements, allowing for a comprehensive test of retargeting capabilities across different motion complexities. The Unitree G1 is a commonly used humanoid robot platform in research, making the results relevant to the robotics community.
5.2. Evaluation Metrics
The evaluation focuses on both the policy's ability to maintain balance and its tracking performance.
-
Success Rate:
- Conceptual Definition: This metric quantifies the percentage of
rollouts(simulation runs) where thepolicysuccessfully completes thereference motionfrom start to end without the robot falling. A rollout is considered a failure if the robot'sanchor body heightororientationdeviates from the reference by more than a predefined threshold, leading to episode termination. - Mathematical Formula: Not explicitly provided as a formula in the paper, but conceptually it is: $ \text{Success Rate} = \frac{\text{Number of Successful Rollouts}}{\text{Total Number of Rollouts}} \times 100% $
- Symbol Explanation:
Number of Successful Rollouts: The count of simulation runs where the robot completed the motion without falling.Total Number of Rollouts: The total number of simulation runs performed for a given policy and evaluation setting.
- Conceptual Definition: This metric quantifies the percentage of
-
Average position error of body parts in global coordinates ():
- Conceptual Definition: This metric measures the average Euclidean distance between the
global 3D positionsof correspondingbody parts(joints or links) of the robot and the retargeted reference motion across all time steps where the policy is active. It indicates how accurately the robot's overall spatial configuration matches the reference. - Mathematical Formula: Assuming
body partsrefers to a set of key body parts and averaged over active frames. $ E_{\mathrm{g-mpbpe}} = \frac{1}{T \cdot N_b} \sum_{t=1}^{T} \sum_{k=1}^{N_b} | \mathbf{p}{k,t}^{\text{robot}} - \mathbf{p}{k,t}^{\text{reference}} |_2 $ - Symbol Explanation:
- : Average position error of body parts in global coordinates, reported in millimeters (
mm). - : Total number of frames (time steps) during which the policy is actively tracking the motion.
- : Number of key body parts considered for error calculation.
- : Global 3D position vector of robot body part at time step .
- : Global 3D position vector of the corresponding reference body part at time step .
- : The Euclidean (L2) norm, representing the 3D distance.
- : Average position error of body parts in global coordinates, reported in millimeters (
- Conceptual Definition: This metric measures the average Euclidean distance between the
-
Average position error of body parts relative to the root position ():
- Conceptual Definition: This metric measures the average Euclidean distance between the
relative 3D positionsof correspondingbody parts(relative to the robot's root) of the robot and the retargeted reference motion. It assesses how well the robot'sinternal poseorbody shapematches the reference, irrespective of its global position. - Mathematical Formula: $ E_{\mathrm{mpbpe}} = \frac{1}{T \cdot N_b} \sum_{t=1}^{T} \sum_{k=1}^{N_b} | (\mathbf{p}{k,t}^{\text{robot}} - \mathbf{p}{\text{root},t}^{\text{robot}}) - (\mathbf{p}{k,t}^{\text{reference}} - \mathbf{p}{\text{root},t}^{\text{reference}}) |_2 $
- Symbol Explanation:
- : Average position error of body parts relative to the root position, reported in millimeters (
mm). - : Total number of frames where the policy is active.
- : Number of key body parts considered.
- : Global 3D position vector of robot body part at time step .
- : Global 3D position vector of the robot's root at time step .
- : Global 3D position vector of reference body part at time step .
- : Global 3D position vector of the reference's root at time step .
- : The Euclidean (L2) norm.
- : Average position error of body parts relative to the root position, reported in millimeters (
- Conceptual Definition: This metric measures the average Euclidean distance between the
-
Average angular error of joint rotations ():
- Conceptual Definition: This metric quantifies the average angular difference between the
joint rotationsof the robot and the retargeted reference motion across all active frames. It directly measures the accuracy of the robot's joint configurations relative to the target. - Mathematical Formula: Assuming joints and angular difference is measured using geodesic distance between orientations (e.g., quaternions or rotation matrices). If using quaternion : . $ E_{\mathrm{mpjpe}} = \frac{1}{T \cdot N_j} \sum_{t=1}^{T} \sum_{k=1}^{N_j} \text{angular_distance}(\mathbf{R}{k,t}^{\text{robot}}, \mathbf{R}{k,t}^{\text{reference}}) $
- Symbol Explanation:
- : Average angular error of joint rotations, reported in radians (
rad). - : Total number of frames where the policy is active.
- : Number of joints considered for error calculation.
- : Rotation (e.g., quaternion or rotation matrix) of robot joint at time step .
- : Rotation of the corresponding reference joint at time step .
- : A function that calculates the shortest angular distance between two 3D rotations, typically using the
geodesic distanceon or quaternion space.
- : Average angular error of joint rotations, reported in radians (
- Conceptual Definition: This metric quantifies the average angular difference between the
-
User Study - Perceptual Faithfulness: This is a subjective metric evaluated through a user study. Participants judge which retargeted motion
looksmore similar orfaithfulto the original human reference motion. This assesses theaesthetic qualityandnaturalnessof the retargeted motion from a human perception standpoint.
5.3. Baselines
The proposed GMR method is evaluated against three other retargeting methods to provide a comprehensive comparison:
-
PHC [38]: This is a widely used open-source retargeting method that relies on
SMPLfor scaling andgradient descentforInverse Kinematics (IK). It serves as a representative of currentSMPL-basedretargeting approaches in robotics research. -
ProtoMotions [40]: Another open-source method that utilizes
global axis-aligned scalingand adifferential IK solver(Mink) forpositionandorientationmatching. It represents a different approach toIK-based retargeting. -
Unitree (Closed-Source): This is a high-quality
retargeted motion datasetprovided by Unitree, generated using a proprietary, closed-source method. It acts as a strongupper-bound baselinefor performance, representing what can be achieved with potentially more sophisticated or hand-tuned industrial solutions.These baselines are chosen because they represent different paradigms in
motion retargeting(SMPL-based, differential IK, and industrial best practice) and are either widely adopted in the community (PHC,ProtoMotions) or provide a benchmark for high-quality results (Unitree).
6. Results & Analysis
The experimental results systematically evaluate the impact of different retargeting methods on RL policy performance and perceptual faithfulness. The analysis focuses on success rates, tracking errors, and user study feedback.
6.1. Core Results Analysis
6.1.1. Policy Success Rates
The following are the results from [Table I] of the original paper:
| Motion | Length (s) | sim | sim-dr | sim2sim | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PHC | GMR | PM | U | PHC | GMR | PM | U | PHC | GMR | PM | U | ||
| Walk 1 | 33 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Walk 2 | 5.5 | 23 | 100 | 100 | 100 | 53.54 | 100 | 99.98 | 100 | 100* | 100* | 100* | 100* |
| Turn 1 | 12.3 | 93 | 100 | 100 | 100 | 87.18 | 99.98 | 99.95 | 100 | 100* | 100* | 99* | 100* |
| Turn 2 | 12.3 | 100 | 100 | 100 | 100 | 99.95 | 99.98 | 100 | 99.98 | 99 | 100 | 100 | 99 |
| Walk (old) | 33 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Walk (army) | 13 | 100 | 100 | 100 | 100 | 99.85 | 98.63 | 99.95 | 99.95 | 100 | 100 | 99 | 100 |
| Hop | 13 | 95 | 100 | 100 | 100 | 92.97 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Walk (knees) | 19.58 | 100 | 100 | 100 | 100 | 99.98 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Dance 1 | 118 | 0 | 100 | 100 | 99 | 0 | 99.46 | 99.24 | 99.95 | 0 | 100 | 100 | 100 |
| Dance 2 | 130.5 | 0 | 100 | 100 | 100 | 0.02 | 99.9 | 99.88 | 99.98 | 0 | 100 | 100 | 100 |
| Dance 3 | 120 | 100 | 100 | 100 | 100 | 100 | 100 | 99.95 | 100 | 99 | 100 | 100 | 100 |
| Dance 4 | 20 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 99 | 100 | 100 | 100 |
| Dance 5 | 68.4 | 100 | 96 | 100 | 100 | 100 | 92.75 | 99.98 | 100 | 100 | 51 | 100 | 100 |
| Run (slow) | 50 | 100 | 100 | 100 | 100 | 99.19 | 99.88 | 99.95 | 99.98 | 100 | 100 | 100 | 100 |
| Run | 11 | 100 | 100 | 100 | 100 | 99.98 | 100 | 99.95 | 100 | 100 | 100 | 100 | 100 |
| Run (stop & go) | 37 | 17 | 98 | 20 | 100 | 20.46 | 91.24 | 40.26 | 99.83 | 74 | 100 | 26 | 100 |
| Hop around | 18 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
| Hopscotch | 10 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 99.98 | 100 | 100 | 100 | 100 |
| Jump and rotate | 21 | 100 | 100 | 100 | 100 | 99.98 | 100 | 99.9 | 100 | 99 | 100 | 100 | 99 |
| Kung fu | 8.6 | 100 | 100 | 100 | 100 | 100 | 99.95 | 100 | 100 | 100 | 100 | 100 | 100 |
| Various sports | 42.58 | 100 | 100 | 100 | 100 | 99.98 | 99.98 | 99.95 | 100 | 100 | 100 | 100 | 99 |
Analysis:
-
Overall Performance: Out of 21 motions, 11 motions achieved
success ratesabove 98% across all retargeting methods, and 3 achieved perfect 100% success (Walk 1,Walk (old),Hop around). This indicates that for many motions, all methods can generate traackable references. -
Impact of Retargeting Quality: For the remaining 7 motions, there's a significant variation in performance.
- Unitree: Consistently achieves near-perfect performance across all motions and evaluation settings (
sim,sim-dr,sim2sim), validating its quality as a baseline. - GMR: Shows strong performance, closely following Unitree. It achieves 100% success for
Dance 1andDance 2wherePHCcompletely fails. Its performance dips slightly forDance 5(51% insim2sim) andRun (stop & go)(91.24% insim-dr, 98% insim). - ProtoMotions (PM): Generally performs well, comparable to
GMRin many cases, but shows a significant drop forRun (stop & go)(20% insim, 40.26% insim-dr, 26% insim2sim). - PHC: Exhibits the lowest performance, with catastrophic failures (0% success) for
Dance 1andDance 2across all settings. It also struggles withWalk 2(23% insim),Turn 1(93% insim), andRun (stop & go)(17% insim).
- Unitree: Consistently achieves near-perfect performance across all motions and evaluation settings (
-
Long-Horizon Motions: The paper notes that long motions aren't inherently challenging, as
PHCstill achieves 100% success forDance 3(two minutes long). Failures are attributed toretargeting artifacts. -
Robustness Settings: The performance typically degrades from
simtosim-drtosim2sim, as morerandomizationandreal-world factors(likeROS timing,state estimation noise) are introduced. Methods producing higher quality references (Unitree, GMR) maintain high success rates even in the more challengingsim-drandsim2simenvironments.This confirms the answer to Q1: the choice of
retargeting method critically impacts policy performance, especially fordynamicorchallenging sequences, and withoutextensive reward engineering.
6.1.2. Tracking Error Statistics
The following are the results from [Table II] of the original paper:
| Eg-mpbpe, mm | Emppe, mm | Empjpe, 10−3 rad | ||||||||||
| Statistics | PHC | GMR | PM | U | PHC | GMR | PM | U | PHC | GMR | PM | U |
| Min | 71.8 | 59.9 | 66.0 | 51.1 | 20.9 | 18.1 | 24.1 | 18.2 | 569.5 | 362.0 | 499.0 | 355.5 |
| Median | 111.9 | 91.2 | 101.9 | 73.4 | 29.9 | 27.6 | 30.4 | 23.1 | 739.8 | 546.0 | 599.7 | 467.2 |
| Mean | 247.8 | 104.1 | 139.7 | 77.2 | 40.2 | 28.1 | 33.2 | 23.2 | 778.5 | 561.7 | 641.8 | 483.0 |
| Max | 1062.3 | 200.0 | 915.6 | 131.4 | 134.4 | 48.0 | 107.9 | 28.9 | 1336.1 | 1044.8 | 1397.9 | 678.5 |
Analysis:
-
Lower Tracking Errors for GMR and Unitree:
GMRandUnitreeconsistently demonstrate significantly lowermeanandmedian tracking errorsacross all three metrics (, , ).- For
global position error(),GMR's mean error is 104.1mm, much lower thanPHC(247.8mm) andProtoMotions(139.7mm), and close toUnitree(77.2mm). - Similar trends are observed for
relative position error() andangular joint error().
- For
-
Consistency: The lower maximum errors for
GMR(e.g., 200.0mm for compared toPHC's 1062.3mm andProtoMotions' 915.6mm) indicate thatGMRproduces more consistently high-quality retargeted motions, avoiding extreme deviations that plaguePHCandProtoMotions. -
Correlation with Success Rate: High
tracking errorsforPHCandProtoMotionscorrelate with their lowersuccess ratesfor certain motions. This suggests that policies trained on these less accurate references struggle to maintain stability, even if they sometimes complete the task.The tracking error statistics provide a more detailed picture, showing that while policies might sometimes
succeedwith lower quality retargets, they do so withconsiderable tracking errors, indicating difficulty in accuratelyimitating the reference.
6.1.3. Impact of Retargeting Artifacts
To answer Q2, the paper directly links low sim2sim success rates to specific artifacts in the retargeted motions:
-
Ground Penetration (PHC): The
PHC retargetofDance 1andDance 2motions exhibited severeground penetration(up to 60 cm). This physically impossible state makes itimpossiblefor anRL policyto track correctly, resulting in 0% success rates. As illustrated in Fig. 3(a) of the original paper:
该图像是插图,展示了带有Unitree标志的机器人模型处于跪地姿势,可能用于说明人形机器人动作重定向中的姿态状态或特定动作示例。The above figure shows an example of
ground penetrationinPHCretargeted motion, making physical simulation impossible. -
Self-Intersection (ProtoMotions): The
ProtoMotions retargetof theRun (stop & go)motion showedrobot legs intersecting with each other. Thisself-intersectionis a physically infeasible state, leading to instability and low success rates (20-40% for ProtoMotions). As illustrated in Fig. 3(b) of the original paper:
该图像是一个机器人动作示意图,展示了Humanoid机器人在动态动作中的运动姿态,可能用于说明论文中关于动作重定向与人形机器人运动跟踪的研究内容。The above figure shows an example of
self-intersectioninProtoMotionsretargeted motion, where robot limbs pass through each other. -
Sudden Jumps in Joint Values (GMR): The
GMR retargetof theDance 5motion displayedmany sudden jumpsin thewaist rollvalue. AlthoughGMRgenerally performs very well, this rareoptimization artifactcan still occur and drastically reducepolicy robustness(GMR'ssim2simsuccess forDance 5was 51%). Theseabrupt velocity spikesmake the motion difficult to execute smoothly. As illustrated in Fig. 3(c) of the original paper:
该图像是一个折线图,展示了腰部滚转角和俯仰角随帧数变化的关节角度曲线,以及关节角度限制(红色虚线)。图中反映了动作轨迹中关节角度接近或超出限制的情况。The above figure shows
sudden jumps in waist roll and pitch valuesin aGMRretargeted motion, indicating abrupt changes in joint angles.
These observations highlight that physically inconsistent height, self-intersections, and sudden jumps in joint values are critical artifacts that must be avoided in retargeting to ensure RL policy success and robustness.
6.1.4. User Study Results
To answer Q3, a user study (N=20 participants) was conducted to assess the perceptual faithfulness of retargeted motions to the source human motion.
The following are the results from [Fig. 4] of the original paper:
该图像是图表,展示了用户研究中20位参与者对GMR与其他三种重定向方法(Unitree、PHC、ProtoMotions)在动作还原真实性上的偏好百分比。图中蓝色代表偏好GMR,绿色为无偏好,橙色代表偏好其他方法。
The above figure displays the user study results comparing GMR to other retargeting methods (Unitree, PHC, ProtoMotions) in terms of faithfulness to the source motion. Bars represent the percentage of responses.
Analysis:
-
GMR vs. PHC/ProtoMotions: Users consistently found
GMRto bemore faithfulto the reference motion compared toPHCandProtoMotions. This suggests that GMR successfully preserves thelookandnaturalnessof the human motion better than other open-source methods. -
GMR vs. Unitree: The comparison between
GMRandUnitreeis closer. WhileUnitreeis still perceived asmore faithfulby users, the difference is less pronounced, and a substantial percentage of users reportedno difference. This indicates thatGMRachieves aperceptual fidelityvery close to a high-quality, closed-source baseline.This user study confirms that
GMRproduces retargeted motions that are not only morephysically feasiblebut alsoperceptually superior, addressing concerns about whether policies are learning the intended motion.
6.1.5. Impact of the First Reference Frame
The paper reiterates a finding from prior work [7] that the starting frame of the reference motion can significantly impact policy performance.
The following are the results from [Table III] of the original paper:
| Motion | Start frame | PHC | GMR | PM | U |
| Walk 2 | 0 | 100 | 64 | 100 | 100 |
| 7 | 100 | 100 | 100 | 100 | |
| Turn 1 | 0 | 14 | 100 | 86 | 47 |
| 49 | 100 | 100 | 99 | 100 |
Analysis:
-
For
Walk 2,GMRhas a lower success rate (64%) when starting from frame 0 compared to 100% when starting from frame 7. -
For
Turn 1,PHCandUnitreeshow drastically reduced success rates (14% and 47% respectively) when starting from frame 0, but achieve 100% when starting from frame 49. EvenProtoMotionssees an improvement from 86% to 99%.This emphasizes the practical importance of carefully selecting an
initial posethat the robot cansafely and robustly reachbefore policy inference begins, and similarly, anend poseforsafe deactivation.
6.2. Ablation Studies / Parameter Analysis
The paper does not present explicit ablation studies in the traditional sense (e.g., removing components of GMR to measure their individual contribution). However, it implicitly touches on the parameter analysis and tuning aspect of GMR. The authors note that the sudden jumps in waist roll for GMR's Dance 5 motion are a rare occurrence introduced during the optimization phase. They suggest that "some motions might require further weight tuning to achieve optimal results" for the optimization weights used in Steps 4 and 5 of GMR. This implies that while the default parameters work well broadly, specific challenging motions might benefit from customized tuning, similar to reward engineering in RL, though applied to the retargeting process itself.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper rigorously demonstrates the critical impact of motion retargeting quality on the performance and robustness of humanoid motion tracking policies. By suppressing reward tuning and domain randomization in the RL training process (BeyondMimic), the authors successfully isolated the effects of retargeting artifacts. The proposed General Motion Retargeting (GMR) method, with its non-uniform local scaling and two-stage optimization, effectively addresses common pitfalls of existing open-source retargeters (PHC, ProtoMotions). GMR consistently yields higher policy success rates and lower tracking errors, approaching the performance of a high-quality closed-source baseline (Unitree). Crucially, the research identifies specific artifacts (ground penetrations, self-intersections, sudden joint value jumps) as detrimental to policy learning and robustness. A user study further validates GMR's perceptual faithfulness to source motions, ensuring that RL policies are trained on visually accurate references.
7.2. Limitations & Future Work
The authors acknowledge several limitations and propose directions for future research:
- Data Source Diversity: The study exclusively used the
LAFAN1 dataset. Future work should explore more diverse sources like theAMASS datasetorhuman motion reconstructed from monocular videoto validateGMR's generalizability. - Robot Embodiment Diversity: The experiments were confined to the
Unitree G1 robot. AlthoughBeyondMimicand the retargeting algorithms are general, extending the analysis to otherhumanoid robots(e.g.,Unitree H1) would be valuable. - Interactive Motions: The current work primarily focused on non-interactive motions. Future research should investigate the impact of
retargetingonmotion sequencesinvolvinginteractionswith the environment, objects, or other robots, which introduces additional complexities likecontact forcesandobject manipulation. - Optimization Tuning: While
GMR's default parameters work well, some complex motions might still benefit fromfurther weight tuningin its optimization stages to eliminate occasionalartifacts, suggesting an area for potential automation or more adaptive parameter selection.
7.3. Personal Insights & Critique
This paper provides a crucial perspective by emphasizing the upstream data quality in RL-based robotics. Too often, RL is expected to be a panacea, capable of learning from noisy or imperfect data by brute force of reward engineering and domain randomization. This work clearly illustrates that garbage in, garbage out still holds true; a high-quality reference motion significantly eases the RL learning burden and leads to more robust and successful policies.
The identification of specific artifacts like ground penetration and self-intersection as critical failure modes is a valuable takeaway. These are not merely cosmetic issues but fundamentally break the physical realism of the reference, making it impossible for a physics-based RL agent to imitate. GMR's approach to non-uniform local scaling is particularly insightful for addressing the embodiment gap, as it acknowledges that human and robot proportions don't scale uniformly across all body parts.
A potential area for improvement or future exploration could be to integrate contact-aware optimization directly into GMR's pipeline. While GMR implicitly addresses foot contact through uniform root scaling and post-processing, explicit contact constraints (e.g., ensuring feet remain on the ground and hands are placed correctly for complex motions) could further enhance realism and reduce artifacts, especially for interactive tasks. Additionally, exploring learning-based approaches to automatically determine optimal scaling factors and optimization weights for GMR could reduce the need for manual tuning and increase its applicability across an even wider range of robots and motions. The user study is a strong component, grounding technical metrics in human perception, which is often the ultimate judge of naturalness in humanoid motion.
Similar papers
Recommended via semantic vector search.