REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
TL;DR Summary
REArtGS reconstructs and dynamically generates articulated objects from two multi-view states by integrating SDF-guided geometric and kinematic-constrained motion constraints into 3D Gaussian Splatting, achieving high-fidelity surface reconstruction and novel state generation on
Abstract
Articulated objects, as prevalent entities in human life, their 3D representations play crucial roles across various applications. However, achieving both high-fidelity textured surface reconstruction and dynamic generation for articulated objects remains challenging for existing methods. In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling realistic surface reconstruction and generation for articulated objects. Specifically, given multi-view RGB images of arbitrary two states of articulated objects, we first introduce an unbiased Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields, enhancing geometry constraints and improving surface reconstruction quality. Then we establish deformable fields for 3D Gaussians constrained by the kinematic structures of articulated objects, achieving unsupervised generation of surface meshes in unseen states. Extensive experiments on both synthetic and real datasets demonstrate our approach achieves high-quality textured surface reconstruction for given states, and enables high-fidelity surface generation for unseen states. Project site: https://sites.google.com/view/reartgs/home.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
- Authors: Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu.
- Affiliations: The authors are affiliated with several prominent research institutions and companies, including Hefei Institutes of Physical Science Chinese Academy of Sciences, University of Science and Technology of China, Hefei University of Technology, Shanghai Jiao Tong University, and ByteDance. This collaboration brings together expertise from both academia and industry.
- Journal/Conference: This paper is an arXiv preprint. Preprints are research articles shared publicly before or during the formal peer-review process. The specific version referenced is
2503.06677v4. The high quality of the work suggests it is intended for a top-tier computer vision or graphics conference like CVPR, ICCV, or SIGGRAPH. - Publication Year: The preprint was submitted in March 2025 (according to its arXiv identifier).
- Abstract: The paper introduces
REArtGS, a framework for creating high-quality 3D models of articulated objects (objects with moving parts, like scissors or laptops). The method takes multi-view images of an object in just two different poses and can both reconstruct its textured 3D surface and generate the surface in any unseen pose between the two. To achieve this,REArtGSenhances 3D Gaussian Splatting with two key innovations: 1) an unbiased Signed Distance Field (SDF) guidance to impose strong geometric constraints and improve reconstruction accuracy, and 2) kinematic-constrained deformable fields to model the object's motion, allowing for the generation of unseen states without direct supervision. Experiments on synthetic and real-world data show the method's superior performance. - Original Source Link:
- arXiv Page: https://arxiv.org/abs/2503.06677
- PDF Link: http://arxiv.org/pdf/2503.06677v4
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Creating accurate, textured, and animatable 3D models of everyday articulated objects is a fundamental challenge in computer vision and robotics. Such models are essential for applications like virtual reality (VR), robotic manipulation, and human-object interaction simulation.
- Existing Gaps: Previous methods often faced a trade-off. Some produced geometrically inaccurate models due to the "shape-radiance ambiguity" (where different combinations of shape and color can produce the same image). Others required extensive data, such as a full video of the object's motion, to model its dynamics. While the new 3D Gaussian Splatting (3DGS) technique excels at photorealistic rendering, its standard form lacks the strong geometric structure needed for high-quality surface reconstruction and cannot inherently model the underlying motion of articulated objects from sparse data.
- Innovation:
REArtGSbridges these gaps by integrating physical and geometric priors directly into the 3DGS framework. It proposes that by using a Signed Distance Field (SDF), a classic geometric representation, to guide the 3D Gaussians, a much more accurate surface can be reconstructed. Furthermore, by modeling the object's motion based on its underlying kinematic structure (i.e., how its joints move), it can generate any intermediate pose from just two snapshots, a task that was previously very difficult.
-
Main Contributions / Findings (What):
- A Novel Framework (
REArtGS): The paper presents the first comprehensive framework to leverage 3DGS for both high-fidelity surface reconstruction and continuous generation of articulated objects, using only multi-view RGB images from two arbitrary states. - Unbiased SDF Guidance for Geometric Accuracy: A key technical contribution is a method to regularize the 3D Gaussian opacity fields using an SDF. This explicitly links the Gaussians to the object's surface, reducing artifacts and significantly improving the quality of the reconstructed mesh. The "unbiased" nature of this guidance ensures the geometric constraint is applied precisely where it is most effective.
- Kinematic-Constrained Deformable Fields for Generation: The framework introduces a way to model the movement of the 3D Gaussians over time. Instead of learning a generic deformation, it constrains the motion to follow the rules of physical joints (rotation or translation). This allows the model to generate realistic, unseen object poses in an unsupervised manner.
- State-of-the-Art Performance: Through extensive experiments on both synthetic (
PartNet-Mobility) and real-world (AKB-48) datasets,REArtGSis shown to significantly outperform previous state-of-the-art methods in both reconstruction and generation tasks.
- A Novel Framework (
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Articulated Objects: These are objects composed of multiple rigid parts connected by joints that allow relative motion. Examples include laptops (hinge joint), drawers (sliding/prismatic joint), and scissors (revolute joint).
- 3D Gaussian Splatting (3DGS): A modern technique for representing a 3D scene. Instead of a continuous field (like NeRF) or a fixed mesh, it uses a large collection of 3D "Gaussians" (ellipsoids with color and transparency). To render an image, these Gaussians are "splatted" (projected) onto the 2D image plane, creating highly realistic views at real-time speeds. It is an explicit representation, meaning the geometric elements (the Gaussians) exist directly in 3D space.
- Signed Distance Field (SDF): A classic way to represent a 3D shape. It's a function that, for any point in 3D space, returns the shortest distance to the shape's surface. The distance is negative if the point is inside the shape, positive if outside, and exactly zero on the surface. This provides a powerful and smooth representation of geometry.
- Neural Radiance Fields (NeRF): An earlier popular technique that uses a neural network to represent a 3D scene as a continuous field of color and density. It is an implicit representation, as the geometry is not stored directly but is encoded in the network's weights. NeRFs produce stunning results but are often slow to train and render.
-
Previous Works & Differentiation:
- Implicit Methods (e.g.,
PARIS): Methods likePARISuse NeRF-based representations to model articulated objects. However, they often suffer from noisy geometry because NeRFs are optimized for view synthesis, not geometric accuracy.REArtGSuses the explicit 3DGS representation and enforces geometric correctness with its SDF guidance. - 3DGS for Static Reconstruction (e.g.,
GOF): These methods adapt 3DGS to extract surfaces from static scenes. However, they either have weak geometric constraints, leading to noisy meshes, or restrict the shape of Gaussians, which can hurt rendering quality.REArtGSintroduces a more principled geometric constraint via the SDF without overly restricting the Gaussians. - 3DGS for Dynamic Scenes (e.g.,
4DGS,Deformable 3DGS): These methods animate scenes using 3DGS but typically require a continuous video sequence as input to learn the deformation of every Gaussian over time.REArtGSis far more data-efficient, needing only two states, because it doesn't learn a generic deformation but rather a structured, kinematically plausible motion. - 3DGS for Articulated Objects (e.g.,
ArtGS): These are the most direct competitors. The paper argues that even these specialized methods lack sufficient geometric constraints, leading to imperfect reconstructions.REArtGS's primary differentiator is the unbiased SDF guidance, which directly addresses this limitation to produce cleaner surfaces.
- Implicit Methods (e.g.,
4. Methodology (Core Technology & Implementation)
The REArtGS framework operates in two main stages: Reconstruction and Generation, as illustrated in the pipeline diagram.

Image 1 shows this pipeline. The Reconstruction stage (left) takes multi-view images of the start state, uses 3D Gaussians and an MLP-based SDF to learn the geometry, and applies the unbiased regularization to produce a high-quality reconstruction. The Generation stage (right) takes this reconstruction, identifies the moving part, learns its motion from the end state, and can then generate any intermediate state.
4.1 Reconstruction with Unbiased SDF Guidance
The core idea is to make the 3D Gaussians "aware" of the object's true surface.
-
SDF Representation: An MLP is trained to approximate the SDF of the scene, where the surface is the set of points where the SDF value is zero: Here, is the SDF value at point predicted by the MLP.
-
Linking SDF to Gaussian Opacity: The opacity of a Gaussian should be highest when it's located on the surface. The paper first defines a base opacity for a Gaussian centered at using its SDF value. To make the opacity peak at the surface (where ), they use a bell-shaped function : This function outputs a high value when is close to 0 and drops off otherwise.
-
The "Bias" Problem: In 3DGS rendering, the contribution of a Gaussian along a viewing ray peaks at a certain depth, . Ideally, this point of peak contribution should lie exactly on the object's surface. However, there's no mechanism in standard 3DGS to enforce this. The SDF is defined at the Gaussian's center, while the peak rendering contribution is at a different point along the ray. This misalignment is the "bias".
-
Unbiased SDF Regularization: To fix this,
REArtGSintroduces a novel loss term that forces the SDF value at the point of peak contribution to be zero:-
and are the camera origin and ray direction.
-
is the depth along the ray where a Gaussian has its maximum influence.
-
is the SDF value at that specific point.
-
By minimizing this loss, the model learns to place Gaussians such that their peak rendering contribution aligns perfectly with the zero-level set of the SDF, i.e., the true surface.

Image 4 visually demonstrates this. Without the regularization, the points of peak contribution (where absolute SDF should be zero) are scattered. With the regularization, they converge neatly to the zero-level, indicating a much stronger geometric alignment.
-
4.2 Mesh Generation with Motion Constraints
After reconstructing a high-quality model, the next step is to animate it.
-
Modeling Motion: The framework assumes articulated motion consists of simple transformations: rotation around an axis or translation along a direction.
- For rotation, it learns a pivot point , a rotation axis , and a total rotation angle .
- For translation, it learns a translation direction and a distance .
-
Deformation Fields: Using these learned parameters, it defines a continuous deformation field that can move any point to its new position at any intermediate state .
- For rotation, it uses Rodrigues' rotation formula to calculate the new position: where is the interpolated angle for state and is a matrix representing the rotation axis.
- For translation, it uses simple linear interpolation:
-
Unsupervised Part Segmentation: The model needs to figure out which Gaussians belong to the moving part and which are static. It does this heuristically:
- It first trains a model only on the end state to see which Gaussians have moved from their initial positions.
- It then refines this segmentation by checking which Gaussians' movements are consistent with the learned global motion parameters (rotation or translation). Gaussians that don't fit the model are considered static or outliers.
4.3 Optimization and Textured Mesh Extraction
The entire model is trained end-to-end by minimizing a combined loss function:
-
: The photometric rendering loss (L1 and D-SSIM) that ensures the rendered images match the input images.
-
: The key geometric loss described above.
-
and : Additional SDF regularizers that ensure the learned surface normals are consistent and the SDF behaves like a proper distance field.
-
: A depth distortion loss to regularize the geometry.
After optimization, a textured mesh is extracted using the Truncated Signed Distance Function (TSDF) fusion algorithm, which integrates rendered depth and color maps from multiple views into a consistent 3D model.
5. Experimental Setup
-
Datasets:
- PartNet-Mobility: A large-scale synthetic dataset containing thousands of articulated objects across various categories. It provides ground truth geometry, making it ideal for quantitative evaluation.
- AKB-48: A real-world dataset of 48 common articulated objects. This dataset is more challenging due to real-world factors like complex lighting and textures, and is used to test the method's generalization ability.
-
Evaluation Metrics:
- Chamfer Distance (CD): Measures the average distance between the points of two surfaces. Lower is better. The paper reports CD(ws) (whole surface) and CD(rs) (ray-sampled surface, focusing on visible parts).
- F1-score: A metric that balances precision and recall for surface reconstruction, measuring the percentage of the surface that is correctly reconstructed within a certain tolerance. Higher is better.
- Earth Mover's Distance (EMD): Measures the cost to transform one point cloud into another, sensitive to the overall distribution of points. Lower is better.
-
Baselines: The method is compared against a strong set of baselines, including:
- Implicit representation methods (
A-SDF,Ditto,PARIS). - 3DGS-based static reconstruction methods (
GOF). - 3DGS-based dynamic/articulated methods (
ArtGS,D-3DGS). For fairness, competing methods requiring depth information (likeArtGS) were re-implemented to use only RGB images.
- Implicit representation methods (
6. Results & Analysis
REArtGS demonstrates superior performance across the board.
-
Core Results:
-
Reconstruction (Table 1 & Figure 4): In the reconstruction task on PartNet-Mobility,
REArtGSachieves the best mean scores on all four metrics (CD(ws), CD(rs), F1, EMD).
Image 5 shows qualitative comparisons. The meshes from
GOF,PARIS, andArtGSare often noisy and contain floaters or holes. In contrast,REArtGSproduces significantly smoother, more complete, and geometrically accurate surfaces, which is a direct result of the unbiased SDF guidance. -
Generation (Table 2 & Figure 5): In the generation task,
REArtGSagain leads in mean performance.
Image 6 shows that
REArtGSgenerates coherent and realistic motions.PARIScan produce distorted shapes, whileD-3DGS(which learns a generic deformation) results in very noisy and broken meshes. The success ofREArtGSis attributed to its high-quality initial reconstruction and the strong prior imposed by the kinematic motion constraints.
-
-
Ablation Studies:
- Effect of SDF Guidance (Table 3): The ablation study confirms the importance of the proposed geometric constraints. The baseline without any SDF performs worst. Adding SDF guidance helps, but adding the unbiased regularization provides the largest performance boost, validating it as the key innovation for reconstruction quality.
- Effect of Motion Constraints (Table 4): This study compares the kinematic-constrained model to a baseline that uses a generic MLP to learn deformations (similar to
D-3DGS). The kinematic model performs drastically better, showing that for sparse data (only two states), imposing a strong motion prior is essential for successful generation.
-
Generalization to the Real World (Table 5 & Figure 6):

On the real-world
AKB-48dataset,REArtGScontinues to outperformPARISandArtGSby a significant margin across most categories. Image 2 shows qualitative results on real objects like scissors and eyeglasses, demonstrating the method's ability to handle real-world complexity and produce high-quality, animatable digital twins.
7. Conclusion & Reflections
-
Conclusion Summary:
REArtGSpresents a powerful and effective framework for modeling articulated objects. By uniquely combining the rendering efficiency of 3D Gaussian Splatting with strong geometric priors from an unbiased SDF and kinematic motion constraints, it achieves state-of-the-art results in both surface reconstruction and dynamic generation from sparse multi-view inputs. The work successfully addresses key limitations of prior methods in both implicit and explicit 3D representations. -
Limitations & Future Work: The authors acknowledge two primary limitations:
- Camera Pose Prior: The method requires pre-calibrated camera poses for the input images. Future work could integrate SLAM (Simultaneous Localization and Mapping) techniques to estimate these poses automatically.
- Transparent Materials: Like most 3D reconstruction methods,
REArtGSstruggles with transparent or highly reflective objects. This could be addressed by incorporating more sophisticated, physically-based rendering models.
-
Personal Insights & Critique:
- Significance: This paper is a significant contribution to the field of 3D computer vision. The core idea of injecting classical geometric representations (like SDFs) and physical priors (kinematics) into modern, data-driven representations like 3DGS is extremely powerful. It shows a path forward for building models that are not only photorealistic but also geometrically sound and physically plausible.
- Assumptions and Scope: The method's current reliance on simple, single-axis joints (revolute and prismatic) is a practical limitation. It would be interesting to see it extended to more complex mechanisms like ball-and-socket joints, screw joints, or objects with multiple interconnected moving parts. The heuristic part-segmentation, while effective in experiments, might be a point of failure for objects with very closely packed or intricately shaped parts.
- Future Impact:
REArtGSand similar approaches could dramatically lower the barrier to creating high-quality, interactive 3D assets. This has massive implications for robotics (enabling robots to better understand and manipulate everyday objects), augmented/virtual reality (populating virtual worlds with realistic, interactive items), and digital content creation. The work solidifies the trend of hybrid models that combine the strengths of explicit representations for efficiency and implicit fields for geometric smoothness and regularization.
Similar papers
Recommended via semantic vector search.