Paper status: completed

HAC++: Towards 100X Compression of 3D Gaussian Splatting

Published:01/22/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
1 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

HAC++ introduces a novel compression framework leveraging unorganized anchors and structured hash grids for spatial context utilization, achieving over 100x size reduction while enhancing fidelity through precise probability estimation and entropy coding.

Abstract

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To achieve a compact size, we propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling. Additionally, HAC++ captures intra-anchor contextual relationships to further enhance compression performance. To facilitate entropy coding, we utilize Gaussian distributions to precisely estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Moreover, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously improving fidelity. It also delivers more than 20X size reduction compared to Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

HAC++: Towards 100X Compression of 3D Gaussian Splatting

1.2. Authors

Yihang Chen (Shanghai Jiao Tong University & Monash University), Qianyi Wu (Monash University), Weiyao Lin (Shanghai Jiao Tong University), Mehrtash Harandi (Monash University), and Jianfei Cai (Monash University).

1.3. Journal/Conference

This paper was accepted at ECCV 2024 (the original version, HAC) and extended for this journal-style publication. ECCV (European Conference on Computer Vision) is one of the top three premier conferences in the field of computer vision.

1.4. Publication Year

The paper was published on arXiv on January 21, 2025 (v4).

1.5. Abstract

3D Gaussian Splatting (3DGS) has revolutionized novel view synthesis with high-speed rendering and fidelity. However, its massive memory footprint (often Gigabytes per scene) makes it difficult to store and transmit. The authors propose HAC++HAC++, a compression framework that leverages the relationship between unorganized anchor points (from Scaffold-GS) and a structured hash grid. By utilizing mutual information for context modeling, intra-anchor relationship capturing, and an adaptive quantization module (AQM), HAC++HAC++ achieves over 100x compression compared to vanilla 3DGS and 20x compared to Scaffold-GS while maintaining or improving visual quality.

2. Executive Summary

2.1. Background & Motivation

The core problem is the storage bottleneck of 3D Gaussian Splatting. While 3DGS is faster and more realistic than Neural Radiance Fields (NeRF), it relies on millions of "Gaussians" (3D particles with color, size, and orientation attributes). Storing these millions of attributes for a single scene can take several Gigabytes.

Prior attempts at compression focused on value-based methods (pruning "unimportant" Gaussians or using codebooks to group similar values). However, these methods ignore the spatial redundancy—the fact that Gaussians near each other often look similar. The authors identify that while Gaussians are "unorganized" (scattered points), a structured hash grid can be used to provide context and predict their values, making them much easier to compress.

2.2. Main Contributions / Findings

  1. Hash-grid Assisted Context (HAC): A method to use a compact, binary hash grid as a reference to predict the statistical distribution of anchor attributes.

  2. Intra-Anchor Context: A mechanism that looks at the internal redundancies within a single anchor point to further sharpen probability estimates.

  3. Adaptive Quantization Module (AQM): A module that learns the best "step size" for rounding off numbers (quantization) to balance file size and image detail.

  4. Adaptive Offset Masking: A differentiable pruning strategy that automatically identifies and deletes useless Gaussians and anchors during training based on their "bit-cost."

  5. Performance: HAC++HAC++ achieves an average of 122.5x compression on the BungeeNeRF dataset compared to vanilla 3DGS and significant gains over the state-of-the-art.


3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

  • 3D Gaussian Splatting (3DGS): A technique where a 3D scene is represented by millions of 3D ellipsoids (Gaussians). Each Gaussian has a position, rotation, scale, opacity, and color (represented by Spherical Harmonics).
  • Scaffold-GS: An evolution of 3DGS that uses "anchors." Instead of storing every Gaussian, it stores sparse anchor points. Each anchor "spawns" multiple local Gaussians. This is inherently more compact than vanilla 3DGS.
  • Entropy Coding: A data compression method (like Arithmetic Coding) that uses fewer bits for values that appear frequently and more bits for values that appear rarely. To work well, you need an accurate "probability model."
  • Quantization: The process of mapping a large set of values to a smaller set (e.g., rounding 3.14159 to 3.1). This reduces data size but introduces "quantization error."

3.2. Previous Works

  • Instant-NGP: Introduced the multi-resolution hash encoding. It uses a grid to store features, which are then interpolated to represent 3D space.
  • Value-based Compression: Methods like LightGaussian or Compressed3D use pruning (deleting small Gaussians) and vector quantization (using a dictionary of common values).
  • Structural-relation-based: Methods that use the spatial arrangement of points to predict values. Scaffold-GS is the most relevant here, as it reduces storage by predicting Gaussian attributes from anchor features using a Small Multi-Layer Perceptron (MLP).

3.3. Differentiation Analysis

Unlike previous methods that either just prune values or just use codebooks, HAC++HAC++ asks: "Can we use a structured grid to help predict these unorganized points?" It bridges the gap between structured representations (Grids) and unstructured ones (Point Clouds/Gaussians).


4. Methodology

4.1. Principles

The core idea is to transform the storage problem into a probability prediction problem. In information theory, if you can perfectly predict the value of a piece of data, it costs 0 bits to store. HAC++HAC++ uses a structured hash grid to provide "context" for anchors, allowing the system to guess their attribute values very accurately.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. 3D Gaussian Splatting Preliminaries

A 3D Gaussian is defined by its mean μR3\pmb{\mu} \in \mathbb{R}^3 and a covariance matrix Σ\pmb{\Sigma}. The formula for a Gaussian G(x)G(\pmb{x}) is: $ G(\pmb{x}) = \exp\left( - \frac{1}{2}(\pmb{x} - \pmb{\mu})^\top \pmb{\Sigma}^{-1} (\pmb{x} - \pmb{\mu}) \right) $ Where:

  • x\pmb{x} is a point in 3D space.

  • μ\pmb{\mu} is the center (mean) of the Gaussian.

  • Σ\pmb{\Sigma} is the shape/orientation (covariance).

    To render an image, Gaussians are projected to 2D. The color CC of a pixel is calculated using α\alpha-composed blending: $ \pmb{C} = \sum_{i \in I} \pmb{c}i \alpha_i \prod{j = 1}^{i - 1} (1 - \alpha_j) $ Where:

  • cic_i is the color of the ii-th Gaussian.

  • αi\alpha_i is the opacity.

  • II is the set of Gaussians sorted by depth.

4.2.2. Bridging Anchors and Hash Grid

In Scaffold-GS, an anchor has attributes A={fa,l,o}\mathcal{A} = \{ \pmb{f}^a, l, \pmb{o} \}, representing the feature, scaling, and offsets for the spawned Gaussians. HAC++HAC++ uses the anchor location xa\pmb{x}^a to query a hash grid H\mathcal{H} to get an interpolated hash feature fh\pmb{f}^h: $ \pmb{f}^h := \mathrm{Interp}(\pmb{x}^a, \mathcal{H}) $ This fh\pmb{f}^h acts as a "clue" for the anchor's attributes. The relationship is modeled as a conditional probability p(Afh)p(\mathcal{A} | \pmb{f}^h).

The following figure (Figure 2 from the original paper) shows the system architecture:

Fig. 2. Overview of our \(\\mathbf { H } \\mathbf { A } \\mathbf { C } + +\) B e attributes, \(\\mathrm { H A C + + }\) enhances compression performance by modeling both inter- and intra-anchor relations. Right: \(\\mathrm { H A C + + }\) consists of a Hash-grid Assisted locations to generate interpolated hash features \(f ^ { h }\) , \(f ^ { h }\) serves as context for predicting the value distributions of anchors. Additionally, HAC outputs \(\\mathbfit { \\Delta } \\mathbfit { r }\) i u iz a co across different rate points. 该图像是一个示意图,展示了HAC++方法的结构及功能。左侧展示了Scaffold方法,中央是HAC的锚点与适应性偏移屏蔽,而右侧则说明了高斯混合模型在特征预测中的应用,其中 p(f_i) = rac{eta_{GMM}(f_i + rac{1}{2}q_i) - eta_{GMM}(f_i - rac{1}{2}q_i)}{L_{entropy}}用于计算熵。这个图在研究中起到了重要的作用。

4.2.3. HAC: Hash-Grid Assisted Context

To compress anchor attributes A\mathcal{A}, they must be quantized. The authors propose the Adaptive Quantization Module (AQM). Instead of a fixed rounding step, they predict a refinement ri\pmb{r}_i from the hash feature: $ \pmb{q}_i = Q_0 \times (1 + \operatorname{Tanh}(\pmb{r}_i)), \text{ where } \pmb{r}_i = \mathrm{MLP}_q(\pmb{f}_i^h) $ Where:

  • Q0Q_0 is a base quantization step.

  • qi\pmb{q}_i is the final step size used for rounding.

    To estimate the probability of a quantized value f^i\hat{f}_i (needed for Arithmetic Coding), they model it as a Gaussian distribution. The parameters μis\pmb{\mu}_i^s and σis\pmb{\sigma}_i^s are predicted from the hash feature: $ p(\hat{f}_i) = \Phi(\hat{f}_i + \frac{q_i}{2} \mid \mu_i^s, \pmb{\sigma}_i^s) - \Phi(\hat{f}_i - \frac{q_i}{2} \mid \mu_i^s, \pmb{\sigma}_i^s) $ Where:

  • Φ\Phi is the Cumulative Distribution Function (CDF) of a Gaussian.

  • μis,σis\mu_i^s, \pmb{\sigma}_i^s are predicted by MLPc(fih)\mathrm{MLP}_c(\pmb{f}_i^h).

4.2.4. Intra-Anchor Context

To capture redundancies within a single anchor feature fa\pmb{f}^a, they split the feature into NcN^c chunks. Each chunk is predicted based on the chunks before it (a causal process): $ \mu_{i, n^c}^c, \sigma_{i, n^c}^c, \pi_{i, n^c}^c = \mathrm{MLP}a([\hat{f}{i, [0, n^c c - c)}^a; \pmb{\mu}_i^s; \pmb{\sigma}_i^s; \pi_i^s]) $ This provides a second probability estimate, which is then fused with the HAC estimate using a Gaussian Mixture Model (GMM): $ p(\hat{f}i^a) = \sum{l \in {s, c}} \theta_i^l \left( \Phi(\hat{f}_i^a + \frac{q_i}{2} \mid \mu_i^l, \sigma_i^l) - \Phi(\hat{f}_i^a - \frac{q_i}{2} \mid \mu_i^l, \sigma_i^l) \right) $ Where θil\theta_i^l is the mixing weight determined by a Softmax over π\pi.

4.2.5. Adaptive Offset Masking

To prune useless Gaussians/anchors, the authors define a mask mi{0,1}m_i \in \{0, 1\}. To make this differentiable, they use a "Straight-Through Estimator": $ m_i = \mathrm{sg}(\mathbb{1}[\mathrm{Sig}(f_i^m) > \epsilon_m] - \mathrm{Sig}(f_i^m)) + \mathrm{Sig}(f_i^m) $ Where sg\mathrm{sg} is the stop-gradient operator. They incorporate this mask directly into the Rate-Distortion (RD) loss, so the model automatically learns to set mi=0m_i = 0 if the "bit cost" of keeping a Gaussian is higher than the quality gain it provides.

4.2.6. Total Training Loss

The model is trained to minimize a combined loss: $ Loss = L_{Scaffold} + \lambda \frac{1}{N(D^a + 6 + 3K)} (L_{entropy} + L_{hash}) $ Where:

  • LScaffoldL_{Scaffold} is the standard rendering quality loss (how good the image looks).

  • LentropyL_{entropy} is the estimated bits needed for attributes.

  • LhashL_{hash} is the bits needed for the hash grid itself.

  • λ\lambda is the "trade-off" parameter (higher λ\lambda means smaller file size, lower λ\lambda means better quality).


5. Experimental Setup

5.1. Datasets

  • Synthetic-NeRF: 8 scenes of artificial objects (e.g., Lego, Chair).
  • Mip-NeRF360: 9 complex real-world scenes (indoors and outdoors).
  • Tanks & Temples: Large-scale outdoor scans.
  • DeepBlending & BungeeNeRF: Datasets featuring varying scales and viewpoints.

5.2. Evaluation Metrics

  1. PSNR (Peak Signal-to-Noise Ratio):
    • Concept: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher is better.
    • Formula: PSNR = 10 \cdot \log_{10}\left(\frac{MAX_I^2}{MSE}\right)
    • Symbols: MAXIMAX_I is the maximum pixel value (usually 255); MSE is the Mean Squared Error between images.
  2. SSIM (Structural Similarity Index):
    • Concept: Quantifies how much the "structure" of the image is preserved, matching human perception better than PSNR.
    • Formula: SSIM(x,y)=(2μxμy+c1)(2σxy+c2)(μx2+μy2+c1)(σx2+σy2+c2)SSIM(x,y) = \frac{(2\mu_x\mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}
    • Symbols: μ\mu is the mean; σ2\sigma^2 is the variance; σxy\sigma_{xy} is the covariance; cc are constants to avoid division by zero.
  3. LPIPS (Learned Perceptual Image Patch Similarity):
    • Concept: Uses a deep neural network to measure how "different" two images look to a human. Lower is better.

5.3. Baselines

The authors compare against:

  • Vanilla 3DGS: The uncompressed original.

  • Scaffold-GS: The base anchor-based model.

  • LightGaussian / Compressed3D: Recent state-of-the-art compression methods for 3DGS.


6. Results & Analysis

6.1. Core Results Analysis

HAC++HAC++ consistently outperforms all baselines in the "Rate-Distortion" trade-off. It achieves similar or higher PSNR than Scaffold-GS while using significantly less space.

The following are the results from Table II of the original paper:

Datasets / Methods Synthetic-NeRF Mip-NeRF360
PSNR↑ SSIM↑ LPIPS↓ Size(MB)↓ PSNR↑ SSIM↑ LPIPS↓ Size(MB)↓
3DGS 33.80 0.970 0.031 68.46 27.46 0.812 0.222 750.9
Scaffold-GS 33.41 0.966 0.035 19.36 27.50 0.806 0.252 253.9
HAC (Previous) 33.71 0.968 0.034 1.86 27.77 0.811 0.230 21.87
Ours HAC++ 33.76 0.969 0.033 1.84 27.82 0.811 0.231 18.48

Analysis: On the Mip-NeRF360 dataset, HAC++HAC++ reduces the model size from 750.9 MB to 18.48 MB—a roughly 40x reduction over vanilla 3DGS, while actually increasing the PSNR from 27.46 to 27.82.

6.2. Ablation Studies

The authors proved that every part of their model works:

  • Without AQM: PSNR drops significantly because the rounding is too aggressive.

  • Without HAC (Hash context): The file size increases by 63.3% because the probability prediction is less accurate.

  • Without Masking: The model size increases by 31.4% because it keeps too many useless Gaussians.


7. Conclusion & Reflections

7.1. Conclusion Summary

HAC++HAC++ represents a significant milestone in 3D scene compression. By successfully bridging structured hash grids with unstructured Gaussian anchors, the authors demonstrate that spatial context is the key to high-ratio compression. Achieving 100x compression on average makes high-quality 3D scenes viable for mobile devices and web streaming.

7.2. Limitations & Future Work

  • Training Time: HAC++HAC++ takes about 80% longer to train than Scaffold-GS due to the complex probability modeling and loss functions.
  • Complexity: The multi-stage training (initial fitting, noise adaptation, context training) is somewhat complex to implement.
  • Future Direction: The authors suggest looking into direct anchor-to-anchor relationships without the intermediate hash grid to further reduce complexity.

7.3. Personal Insights & Critique

The most brilliant aspect of this paper is the move from "storing values" to "predicting distributions." By using the GMM to fuse inter-anchor (spatial) and intra-anchor (internal) clues, the authors have created a very robust probability engine.

However, one might critique the reliance on Scaffold-GS as the base. While Scaffold-GS is efficient, HAC++HAC++ inherits its specific artifacts. Furthermore, while the rendering FPS is high, the "coding time" (compressing/decompressing) is around 10-30 seconds, which might be a barrier for real-time interactive applications where scenes need to be updated on the fly. Overall, HAC++HAC++ sets a high bar for future research in 3D representation compression.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.