Paper status: completed

REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints

Published:03/10/2025

Geometric and Motion Constraints (1)Hierarchical 3D Gaussian representation (2)3D articulated object reconstruction (1)Dynamic articulated object generation (1)Signed Distance Field (SDF) regularization (1)

Original Link PDF

Price: 0.100000

2 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

REArtGS reconstructs and dynamically generates articulated objects from two multi-view states by integrating SDF-guided geometric and kinematic-constrained motion constraints into 3D Gaussian Splatting, achieving high-fidelity surface reconstruction and novel state generation on

Abstract

Articulated objects, as prevalent entities in human life, their 3D representations play crucial roles across various applications. However, achieving both high-fidelity textured surface reconstruction and dynamic generation for articulated objects remains challenging for existing methods. In this paper, we present REArtGS, a novel framework that introduces additional geometric and motion constraints to 3D Gaussian primitives, enabling realistic surface reconstruction and generation for articulated objects. Specifically, given multi-view RGB images of arbitrary two states of articulated objects, we first introduce an unbiased Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields, enhancing geometry constraints and improving surface reconstruction quality. Then we establish deformable fields for 3D Gaussians constrained by the kinematic structures of articulated objects, achieving unsupervised generation of surface meshes in unseen states. Extensive experiments on both synthetic and real datasets demonstrate our approach achieves high-quality textured surface reconstruction for given states, and enables high-fidelity surface generation for unseen states. Project site: https://sites.google.com/view/reartgs/home.

Mind Map

In-depth Reading

English Analysis~13 min read · 16,082 chars

1. Bibliographic Information

Title: REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
Authors: Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, and Cewu Lu.
Affiliations: The authors are affiliated with several prominent research institutions and companies, including Hefei Institutes of Physical Science Chinese Academy of Sciences, University of Science and Technology of China, Hefei University of Technology, Shanghai Jiao Tong University, and ByteDance. This collaboration brings together expertise from both academia and industry.
Journal/Conference: This paper is an arXiv preprint. Preprints are research articles shared publicly before or during the formal peer-review process. The specific version referenced is 2503.06677v4. The high quality of the work suggests it is intended for a top-tier computer vision or graphics conference like CVPR, ICCV, or SIGGRAPH.
Publication Year: The preprint was submitted in March 2025 (according to its arXiv identifier).
Abstract: The paper introduces REArtGS, a framework for creating high-quality 3D models of articulated objects (objects with moving parts, like scissors or laptops). The method takes multi-view images of an object in just two different poses and can both reconstruct its textured 3D surface and generate the surface in any unseen pose between the two. To achieve this, REArtGS enhances 3D Gaussian Splatting with two key innovations: 1) an unbiased Signed Distance Field (SDF) guidance to impose strong geometric constraints and improve reconstruction accuracy, and 2) kinematic-constrained deformable fields to model the object's motion, allowing for the generation of unseen states without direct supervision. Experiments on synthetic and real-world data show the method's superior performance.
Original Source Link:
- arXiv Page: https://arxiv.org/abs/2503.06677
- PDF Link: http://arxiv.org/pdf/2503.06677v4

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Creating accurate, textured, and animatable 3D models of everyday articulated objects is a fundamental challenge in computer vision and robotics. Such models are essential for applications like virtual reality (VR), robotic manipulation, and human-object interaction simulation.
- Existing Gaps: Previous methods often faced a trade-off. Some produced geometrically inaccurate models due to the "shape-radiance ambiguity" (where different combinations of shape and color can produce the same image). Others required extensive data, such as a full video of the object's motion, to model its dynamics. While the new 3D Gaussian Splatting (3DGS) technique excels at photorealistic rendering, its standard form lacks the strong geometric structure needed for high-quality surface reconstruction and cannot inherently model the underlying motion of articulated objects from sparse data.
- Innovation: REArtGS bridges these gaps by integrating physical and geometric priors directly into the 3DGS framework. It proposes that by using a Signed Distance Field (SDF), a classic geometric representation, to guide the 3D Gaussians, a much more accurate surface can be reconstructed. Furthermore, by modeling the object's motion based on its underlying kinematic structure (i.e., how its joints move), it can generate any intermediate pose from just two snapshots, a task that was previously very difficult.
Main Contributions / Findings (What):
1. A Novel Framework (REArtGS): The paper presents the first comprehensive framework to leverage 3DGS for both high-fidelity surface reconstruction and continuous generation of articulated objects, using only multi-view RGB images from two arbitrary states.
2. Unbiased SDF Guidance for Geometric Accuracy: A key technical contribution is a method to regularize the 3D Gaussian opacity fields using an SDF. This explicitly links the Gaussians to the object's surface, reducing artifacts and significantly improving the quality of the reconstructed mesh. The "unbiased" nature of this guidance ensures the geometric constraint is applied precisely where it is most effective.
3. Kinematic-Constrained Deformable Fields for Generation: The framework introduces a way to model the movement of the 3D Gaussians over time. Instead of learning a generic deformation, it constrains the motion to follow the rules of physical joints (rotation or translation). This allows the model to generate realistic, unseen object poses in an unsupervised manner.
4. State-of-the-Art Performance: Through extensive experiments on both synthetic (PartNet-Mobility) and real-world (AKB-48) datasets, REArtGS is shown to significantly outperform previous state-of-the-art methods in both reconstruction and generation tasks.

Foundational Concepts:
- Articulated Objects: These are objects composed of multiple rigid parts connected by joints that allow relative motion. Examples include laptops (hinge joint), drawers (sliding/prismatic joint), and scissors (revolute joint).
- 3D Gaussian Splatting (3DGS): A modern technique for representing a 3D scene. Instead of a continuous field (like NeRF) or a fixed mesh, it uses a large collection of 3D "Gaussians" (ellipsoids with color and transparency). To render an image, these Gaussians are "splatted" (projected) onto the 2D image plane, creating highly realistic views at real-time speeds. It is an explicit representation, meaning the geometric elements (the Gaussians) exist directly in 3D space.
- Signed Distance Field (SDF): A classic way to represent a 3D shape. It's a function that, for any point in 3D space, returns the shortest distance to the shape's surface. The distance is negative if the point is inside the shape, positive if outside, and exactly zero on the surface. This provides a powerful and smooth representation of geometry.
- Neural Radiance Fields (NeRF): An earlier popular technique that uses a neural network to represent a 3D scene as a continuous field of color and density. It is an implicit representation, as the geometry is not stored directly but is encoded in the network's weights. NeRFs produce stunning results but are often slow to train and render.
Previous Works & Differentiation:
- Implicit Methods (e.g., PARIS): Methods like PARIS use NeRF-based representations to model articulated objects. However, they often suffer from noisy geometry because NeRFs are optimized for view synthesis, not geometric accuracy. REArtGS uses the explicit 3DGS representation and enforces geometric correctness with its SDF guidance.
- 3DGS for Static Reconstruction (e.g., GOF): These methods adapt 3DGS to extract surfaces from static scenes. However, they either have weak geometric constraints, leading to noisy meshes, or restrict the shape of Gaussians, which can hurt rendering quality. REArtGS introduces a more principled geometric constraint via the SDF without overly restricting the Gaussians.
- 3DGS for Dynamic Scenes (e.g., 4DGS, Deformable 3DGS): These methods animate scenes using 3DGS but typically require a continuous video sequence as input to learn the deformation of every Gaussian over time. REArtGS is far more data-efficient, needing only two states, because it doesn't learn a generic deformation but rather a structured, kinematically plausible motion.
- 3DGS for Articulated Objects (e.g., ArtGS): These are the most direct competitors. The paper argues that even these specialized methods lack sufficient geometric constraints, leading to imperfect reconstructions. REArtGS's primary differentiator is the unbiased SDF guidance, which directly addresses this limitation to produce cleaner surfaces.

4. Methodology (Core Technology & Implementation)

The REArtGS framework operates in two main stages: Reconstruction and Generation, as illustrated in the pipeline diagram.

Figure 2: The overall pipeline of REArtGS.

Image 1 shows this pipeline. The Reconstruction stage (left) takes multi-view images of the start state, uses 3D Gaussians and an MLP-based SDF to learn the geometry, and applies the unbiased regularization to produce a high-quality reconstruction. The Generation stage (right) takes this reconstruction, identifies the moving part, learns its motion from the end state, and can then generate any intermediate state.

4.1 Reconstruction with Unbiased SDF Guidance

The core idea is to make the 3D Gaussians "aware" of the object's true surface.

SDF Representation: An MLP is trained to approximate the SDF of the scene, where the surface $\mathcal{S}$ is the set of points $\mathbf{x}$ where the SDF value is zero: $\mathcal { S } = \left\{ \mathbf { x } \in \mathbb { R } ^ { 3 } \ \vert \ f ( \mathbf { x } ) = 0 \right\}$ Here, $f(\mathbf{x})$ is the SDF value at point $\mathbf{x}$ predicted by the MLP.
Linking SDF to Gaussian Opacity: The opacity of a Gaussian should be highest when it's located on the surface. The paper first defines a base opacity $\hat{\sigma}_i$ for a Gaussian centered at $\mathbf{x}_i$ using its SDF value. To make the opacity peak at the surface (where $f(\mathbf{x})=0$ ), they use a bell-shaped function $\Phi_k$ : $\Phi _ { k } ( f ( \mathbf { x } ) ) = { \frac { e ^ { k \cdot f ( \mathbf { x } ) } } { \left( 1 + e ^ { k \cdot f ( \mathbf { x } ) } \right) ^ { 2 } } }$ This function outputs a high value when $f(\mathbf{x})$ is close to 0 and drops off otherwise.
The "Bias" Problem: In 3DGS rendering, the contribution of a Gaussian along a viewing ray peaks at a certain depth, $t^*$ . Ideally, this point of peak contribution should lie exactly on the object's surface. However, there's no mechanism in standard 3DGS to enforce this. The SDF is defined at the Gaussian's center, while the peak rendering contribution is at a different point along the ray. This misalignment is the "bias".
Unbiased SDF Regularization: To fix this, REArtGS introduces a novel loss term that forces the SDF value at the point of peak contribution to be zero: $\mathcal { L } _ { \mathrm { unbias } } = \left\| f ( \mathbf { o } + t ^ { * } \mathbf { r } ) \right\| _ { 2 } ^ { 2 }$
- $\mathbf{o}$ and $\mathbf{r}$ are the camera origin and ray direction.
- $t^*$ is the depth along the ray where a Gaussian has its maximum influence.
- $f(\mathbf{o} + t^*\mathbf{r})$ is the SDF value at that specific point.
- By minimizing this loss, the model learns to place Gaussians such that their peak rendering contribution aligns perfectly with the zero-level set of the SDF, i.e., the true surface.
  
  Image 4 visually demonstrates this. Without the regularization, the points of peak contribution (where absolute SDF $|S|$ should be zero) are scattered. With the regularization, they converge neatly to the zero-level, indicating a much stronger geometric alignment.

4.2 Mesh Generation with Motion Constraints

After reconstructing a high-quality model, the next step is to animate it.

Modeling Motion: The framework assumes articulated motion consists of simple transformations: rotation around an axis or translation along a direction.
- For rotation, it learns a pivot point $\mathbf{o}_r$ , a rotation axis $\mathbf{a}$ , and a total rotation angle $\theta$ .
- For translation, it learns a translation direction $\mathbf{d}$ and a distance $m$ .
Deformation Fields: Using these learned parameters, it defines a continuous deformation field that can move any point $\mathbf{x}$ to its new position $\mathbf{x}_s$ at any intermediate state $s \in [0, 1]$ .
- For rotation, it uses Rodrigues' rotation formula to calculate the new position: $\mathbf { x } _ { s } = \left( \mathbf { I } + \sin ( \theta _ { s } ) \mathbf { K } + ( 1 - \cos ( \theta _ { s } ) ) \mathbf { K } ^ { 2 } \right) ( \mathbf { x } - \mathbf { o } _ { r } ) + \mathbf { o } _ { r }$ where $\theta_s$ is the interpolated angle for state $s$ and $\mathbf{K}$ is a matrix representing the rotation axis.
- For translation, it uses simple linear interpolation: $\mathbf { x } _ { s } = \mathbf { x } + s \cdot m \cdot \mathbf { d }$
Unsupervised Part Segmentation: The model needs to figure out which Gaussians belong to the moving part and which are static. It does this heuristically:
- It first trains a model only on the end state to see which Gaussians have moved from their initial positions.
- It then refines this segmentation by checking which Gaussians' movements are consistent with the learned global motion parameters (rotation or translation). Gaussians that don't fit the model are considered static or outliers.

4.3 Optimization and Textured Mesh Extraction

The entire model is trained end-to-end by minimizing a combined loss function: $\mathcal { L } = \mathcal { L } _ { c } + \lambda _ { 1 } \mathcal { L } _ { \mathrm { unbias } } + \lambda _ { 2 } \mathcal { L } _ { \mathrm { n o r m a l } } + \lambda _ { 3 } \mathcal { L } _ { \mathrm { e i k } } + \lambda _ { 3 } \mathcal { L } _ { d }$

$\mathcal{L}_c$ : The photometric rendering loss (L1 and D-SSIM) that ensures the rendered images match the input images.
$\mathcal{L}_{\mathrm{unbias}}$ : The key geometric loss described above.
$\mathcal{L}_{\mathrm{normal}}$ and $\mathcal{L}_{\mathrm{eik}}$ : Additional SDF regularizers that ensure the learned surface normals are consistent and the SDF behaves like a proper distance field.
$\mathcal{L}_d$ : A depth distortion loss to regularize the geometry.

After optimization, a textured mesh is extracted using the Truncated Signed Distance Function (TSDF) fusion algorithm, which integrates rendered depth and color maps from multiple views into a consistent 3D model.

5. Experimental Setup

Datasets:
- PartNet-Mobility: A large-scale synthetic dataset containing thousands of articulated objects across various categories. It provides ground truth geometry, making it ideal for quantitative evaluation.
- AKB-48: A real-world dataset of 48 common articulated objects. This dataset is more challenging due to real-world factors like complex lighting and textures, and is used to test the method's generalization ability.
Evaluation Metrics:
- Chamfer Distance (CD): Measures the average distance between the points of two surfaces. Lower is better. The paper reports CD(ws) (whole surface) and CD(rs) (ray-sampled surface, focusing on visible parts).
- F1-score: A metric that balances precision and recall for surface reconstruction, measuring the percentage of the surface that is correctly reconstructed within a certain tolerance. Higher is better.
- Earth Mover's Distance (EMD): Measures the cost to transform one point cloud into another, sensitive to the overall distribution of points. Lower is better.
Baselines: The method is compared against a strong set of baselines, including:
- Implicit representation methods (A-SDF, Ditto, PARIS).
- 3DGS-based static reconstruction methods (GOF).
- 3DGS-based dynamic/articulated methods (ArtGS, D-3DGS). For fairness, competing methods requiring depth information (like ArtGS) were re-implemented to use only RGB images.

6. Results & Analysis

REArtGS demonstrates superior performance across the board.

Core Results:
- Reconstruction (Table 1 & Figure 4): In the reconstruction task on PartNet-Mobility, REArtGS achieves the best mean scores on all four metrics (CD(ws), CD(rs), F1, EMD).
  
  Image 5 shows qualitative comparisons. The meshes from GOF, PARIS, and ArtGS are often noisy and contain floaters or holes. In contrast, REArtGS produces significantly smoother, more complete, and geometrically accurate surfaces, which is a direct result of the unbiased SDF guidance.
- Generation (Table 2 & Figure 5): In the generation task, REArtGS again leads in mean performance.
  
  Image 6 shows that REArtGS generates coherent and realistic motions. PARIS can produce distorted shapes, while D-3DGS (which learns a generic deformation) results in very noisy and broken meshes. The success of REArtGS is attributed to its high-quality initial reconstruction and the strong prior imposed by the kinematic motion constraints.
Ablation Studies:
- Effect of SDF Guidance (Table 3): The ablation study confirms the importance of the proposed geometric constraints. The baseline without any SDF performs worst. Adding SDF guidance helps, but adding the unbiased regularization provides the largest performance boost, validating it as the key innovation for reconstruction quality.
- Effect of Motion Constraints (Table 4): This study compares the kinematic-constrained model to a baseline that uses a generic MLP to learn deformations (similar to D-3DGS). The kinematic model performs drastically better, showing that for sparse data (only two states), imposing a strong motion prior is essential for successful generation.
Generalization to the Real World (Table 5 & Figure 6):

On the real-world AKB-48 dataset, REArtGS continues to outperform PARIS and ArtGS by a significant margin across most categories. Image 2 shows qualitative results on real objects like scissors and eyeglasses, demonstrating the method's ability to handle real-world complexity and produce high-quality, animatable digital twins.

7. Conclusion & Reflections

Conclusion Summary: REArtGS presents a powerful and effective framework for modeling articulated objects. By uniquely combining the rendering efficiency of 3D Gaussian Splatting with strong geometric priors from an unbiased SDF and kinematic motion constraints, it achieves state-of-the-art results in both surface reconstruction and dynamic generation from sparse multi-view inputs. The work successfully addresses key limitations of prior methods in both implicit and explicit 3D representations.
Limitations & Future Work: The authors acknowledge two primary limitations:
1. Camera Pose Prior: The method requires pre-calibrated camera poses for the input images. Future work could integrate SLAM (Simultaneous Localization and Mapping) techniques to estimate these poses automatically.
2. Transparent Materials: Like most 3D reconstruction methods, REArtGS struggles with transparent or highly reflective objects. This could be addressed by incorporating more sophisticated, physically-based rendering models.
Personal Insights & Critique:
- Significance: This paper is a significant contribution to the field of 3D computer vision. The core idea of injecting classical geometric representations (like SDFs) and physical priors (kinematics) into modern, data-driven representations like 3DGS is extremely powerful. It shows a path forward for building models that are not only photorealistic but also geometrically sound and physically plausible.
- Assumptions and Scope: The method's current reliance on simple, single-axis joints (revolute and prismatic) is a practical limitation. It would be interesting to see it extended to more complex mechanisms like ball-and-socket joints, screw joints, or objects with multiple interconnected moving parts. The heuristic part-segmentation, while effective in experiments, might be a point of failure for objects with very closely packed or intricately shaped parts.
- Future Impact: REArtGS and similar approaches could dramatically lower the barrier to creating high-quality, interactive 3D assets. This has massive implications for robotics (enabling robots to better understand and manipulate everyday objects), augmented/virtual reality (populating virtual worlds with realistic, interactive items), and digital content creation. The work solidifies the trend of hybrid models that combine the strengths of explicit representations for efficiency and implicit fields for geometric smoothness and regularization.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.