HAC++: Towards 100X Compression of 3D Gaussian Splatting
TL;DR Summary
HAC++ introduces a novel compression framework leveraging unorganized anchors and structured hash grids for spatial context utilization, achieving over 100x size reduction while enhancing fidelity through precise probability estimation and entropy coding.
Abstract
3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To achieve a compact size, we propose HAC++, which leverages the relationships between unorganized anchors and a structured hash grid, utilizing their mutual information for context modeling. Additionally, HAC++ captures intra-anchor contextual relationships to further enhance compression performance. To facilitate entropy coding, we utilize Gaussian distributions to precisely estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Moreover, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Overall, HAC++ achieves a remarkable size reduction of over 100X compared to vanilla 3DGS when averaged on all datasets, while simultaneously improving fidelity. It also delivers more than 20X size reduction compared to Scaffold-GS. Our code is available at https://github.com/YihangChen-ee/HAC-plus.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
HAC++: Towards 100X Compression of 3D Gaussian Splatting
1.2. Authors
Yihang Chen (Shanghai Jiao Tong University & Monash University), Qianyi Wu (Monash University), Weiyao Lin (Shanghai Jiao Tong University), Mehrtash Harandi (Monash University), and Jianfei Cai (Monash University).
1.3. Journal/Conference
This paper was accepted at ECCV 2024 (the original version, HAC) and extended for this journal-style publication. ECCV (European Conference on Computer Vision) is one of the top three premier conferences in the field of computer vision.
1.4. Publication Year
The paper was published on arXiv on January 21, 2025 (v4).
1.5. Abstract
3D Gaussian Splatting (3DGS) has revolutionized novel view synthesis with high-speed rendering and fidelity. However, its massive memory footprint (often Gigabytes per scene) makes it difficult to store and transmit. The authors propose , a compression framework that leverages the relationship between unorganized anchor points (from Scaffold-GS) and a structured hash grid. By utilizing mutual information for context modeling, intra-anchor relationship capturing, and an adaptive quantization module (AQM), achieves over 100x compression compared to vanilla 3DGS and 20x compared to Scaffold-GS while maintaining or improving visual quality.
1.6. Original Source Link
-
ArXiv Link: https://arxiv.org/abs/2501.12255
-
PDF Link: https://arxiv.org/pdf/2501.12255v4.pdf
-
Code: https://github.com/YihangChen-ee/HAC-plus
2. Executive Summary
2.1. Background & Motivation
The core problem is the storage bottleneck of 3D Gaussian Splatting. While 3DGS is faster and more realistic than Neural Radiance Fields (NeRF), it relies on millions of "Gaussians" (3D particles with color, size, and orientation attributes). Storing these millions of attributes for a single scene can take several Gigabytes.
Prior attempts at compression focused on value-based methods (pruning "unimportant" Gaussians or using codebooks to group similar values). However, these methods ignore the spatial redundancy—the fact that Gaussians near each other often look similar. The authors identify that while Gaussians are "unorganized" (scattered points), a structured hash grid can be used to provide context and predict their values, making them much easier to compress.
2.2. Main Contributions / Findings
-
Hash-grid Assisted Context (HAC): A method to use a compact, binary
hash gridas a reference to predict the statistical distribution of anchor attributes. -
Intra-Anchor Context: A mechanism that looks at the internal redundancies within a single anchor point to further sharpen probability estimates.
-
Adaptive Quantization Module (AQM): A module that learns the best "step size" for rounding off numbers (quantization) to balance file size and image detail.
-
Adaptive Offset Masking: A differentiable pruning strategy that automatically identifies and deletes useless Gaussians and anchors during training based on their "bit-cost."
-
Performance: achieves an average of 122.5x compression on the
BungeeNeRFdataset compared to vanilla3DGSand significant gains over the state-of-the-art.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
- 3D Gaussian Splatting (3DGS): A technique where a 3D scene is represented by millions of 3D ellipsoids (Gaussians). Each Gaussian has a position, rotation, scale, opacity, and color (represented by Spherical Harmonics).
- Scaffold-GS: An evolution of
3DGSthat uses "anchors." Instead of storing every Gaussian, it stores sparse anchor points. Each anchor "spawns" multiple local Gaussians. This is inherently more compact than vanilla3DGS. - Entropy Coding: A data compression method (like
Arithmetic Coding) that uses fewer bits for values that appear frequently and more bits for values that appear rarely. To work well, you need an accurate "probability model." - Quantization: The process of mapping a large set of values to a smaller set (e.g., rounding
3.14159to3.1). This reduces data size but introduces "quantization error."
3.2. Previous Works
- Instant-NGP: Introduced the
multi-resolution hash encoding. It uses a grid to store features, which are then interpolated to represent 3D space. - Value-based Compression: Methods like
LightGaussianorCompressed3Duse pruning (deleting small Gaussians) and vector quantization (using a dictionary of common values). - Structural-relation-based: Methods that use the spatial arrangement of points to predict values.
Scaffold-GSis the most relevant here, as it reduces storage by predicting Gaussian attributes from anchor features using a Small Multi-Layer Perceptron (MLP).
3.3. Differentiation Analysis
Unlike previous methods that either just prune values or just use codebooks, asks: "Can we use a structured grid to help predict these unorganized points?" It bridges the gap between structured representations (Grids) and unstructured ones (Point Clouds/Gaussians).
4. Methodology
4.1. Principles
The core idea is to transform the storage problem into a probability prediction problem. In information theory, if you can perfectly predict the value of a piece of data, it costs 0 bits to store. uses a structured hash grid to provide "context" for anchors, allowing the system to guess their attribute values very accurately.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. 3D Gaussian Splatting Preliminaries
A 3D Gaussian is defined by its mean and a covariance matrix . The formula for a Gaussian is: $ G(\pmb{x}) = \exp\left( - \frac{1}{2}(\pmb{x} - \pmb{\mu})^\top \pmb{\Sigma}^{-1} (\pmb{x} - \pmb{\mu}) \right) $ Where:
-
is a point in 3D space.
-
is the center (mean) of the Gaussian.
-
is the shape/orientation (covariance).
To render an image, Gaussians are projected to 2D. The color of a pixel is calculated using -composed blending: $ \pmb{C} = \sum_{i \in I} \pmb{c}i \alpha_i \prod{j = 1}^{i - 1} (1 - \alpha_j) $ Where:
-
is the color of the -th Gaussian.
-
is the opacity.
-
is the set of Gaussians sorted by depth.
4.2.2. Bridging Anchors and Hash Grid
In Scaffold-GS, an anchor has attributes , representing the feature, scaling, and offsets for the spawned Gaussians. uses the anchor location to query a hash grid to get an interpolated hash feature :
$
\pmb{f}^h := \mathrm{Interp}(\pmb{x}^a, \mathcal{H})
$
This acts as a "clue" for the anchor's attributes. The relationship is modeled as a conditional probability .
The following figure (Figure 2 from the original paper) shows the system architecture:
该图像是一个示意图,展示了HAC++方法的结构及功能。左侧展示了Scaffold方法,中央是HAC的锚点与适应性偏移屏蔽,而右侧则说明了高斯混合模型在特征预测中的应用,其中 p(f_i) = rac{eta_{GMM}(f_i + rac{1}{2}q_i) - eta_{GMM}(f_i - rac{1}{2}q_i)}{L_{entropy}}用于计算熵。这个图在研究中起到了重要的作用。
4.2.3. HAC: Hash-Grid Assisted Context
To compress anchor attributes , they must be quantized. The authors propose the Adaptive Quantization Module (AQM). Instead of a fixed rounding step, they predict a refinement from the hash feature: $ \pmb{q}_i = Q_0 \times (1 + \operatorname{Tanh}(\pmb{r}_i)), \text{ where } \pmb{r}_i = \mathrm{MLP}_q(\pmb{f}_i^h) $ Where:
-
is a base quantization step.
-
is the final step size used for rounding.
To estimate the probability of a quantized value (needed for Arithmetic Coding), they model it as a Gaussian distribution. The parameters and are predicted from the hash feature: $ p(\hat{f}_i) = \Phi(\hat{f}_i + \frac{q_i}{2} \mid \mu_i^s, \pmb{\sigma}_i^s) - \Phi(\hat{f}_i - \frac{q_i}{2} \mid \mu_i^s, \pmb{\sigma}_i^s) $ Where:
-
is the Cumulative Distribution Function (CDF) of a Gaussian.
-
are predicted by .
4.2.4. Intra-Anchor Context
To capture redundancies within a single anchor feature , they split the feature into chunks. Each chunk is predicted based on the chunks before it (a causal process):
$
\mu_{i, n^c}^c, \sigma_{i, n^c}^c, \pi_{i, n^c}^c = \mathrm{MLP}a([\hat{f}{i, [0, n^c c - c)}^a; \pmb{\mu}_i^s; \pmb{\sigma}_i^s; \pi_i^s])
$
This provides a second probability estimate, which is then fused with the HAC estimate using a Gaussian Mixture Model (GMM):
$
p(\hat{f}i^a) = \sum{l \in {s, c}} \theta_i^l \left( \Phi(\hat{f}_i^a + \frac{q_i}{2} \mid \mu_i^l, \sigma_i^l) - \Phi(\hat{f}_i^a - \frac{q_i}{2} \mid \mu_i^l, \sigma_i^l) \right)
$
Where is the mixing weight determined by a Softmax over .
4.2.5. Adaptive Offset Masking
To prune useless Gaussians/anchors, the authors define a mask . To make this differentiable, they use a "Straight-Through Estimator": $ m_i = \mathrm{sg}(\mathbb{1}[\mathrm{Sig}(f_i^m) > \epsilon_m] - \mathrm{Sig}(f_i^m)) + \mathrm{Sig}(f_i^m) $ Where is the stop-gradient operator. They incorporate this mask directly into the Rate-Distortion (RD) loss, so the model automatically learns to set if the "bit cost" of keeping a Gaussian is higher than the quality gain it provides.
4.2.6. Total Training Loss
The model is trained to minimize a combined loss: $ Loss = L_{Scaffold} + \lambda \frac{1}{N(D^a + 6 + 3K)} (L_{entropy} + L_{hash}) $ Where:
-
is the standard rendering quality loss (how good the image looks).
-
is the estimated bits needed for attributes.
-
is the bits needed for the hash grid itself.
-
is the "trade-off" parameter (higher means smaller file size, lower means better quality).
5. Experimental Setup
5.1. Datasets
- Synthetic-NeRF: 8 scenes of artificial objects (e.g., Lego, Chair).
- Mip-NeRF360: 9 complex real-world scenes (indoors and outdoors).
- Tanks & Temples: Large-scale outdoor scans.
- DeepBlending & BungeeNeRF: Datasets featuring varying scales and viewpoints.
5.2. Evaluation Metrics
- PSNR (Peak Signal-to-Noise Ratio):
- Concept: Measures the ratio between the maximum possible power of a signal and the power of corrupting noise. Higher is better.
- Formula:
PSNR = 10 \cdot \log_{10}\left(\frac{MAX_I^2}{MSE}\right) - Symbols: is the maximum pixel value (usually 255);
MSEis the Mean Squared Error between images.
- SSIM (Structural Similarity Index):
- Concept: Quantifies how much the "structure" of the image is preserved, matching human perception better than PSNR.
- Formula:
- Symbols: is the mean; is the variance; is the covariance; are constants to avoid division by zero.
- LPIPS (Learned Perceptual Image Patch Similarity):
- Concept: Uses a deep neural network to measure how "different" two images look to a human. Lower is better.
5.3. Baselines
The authors compare against:
-
Vanilla 3DGS: The uncompressed original.
-
Scaffold-GS: The base anchor-based model.
-
LightGaussian / Compressed3D: Recent state-of-the-art compression methods for 3DGS.
6. Results & Analysis
6.1. Core Results Analysis
consistently outperforms all baselines in the "Rate-Distortion" trade-off. It achieves similar or higher PSNR than Scaffold-GS while using significantly less space.
The following are the results from Table II of the original paper:
| Datasets / Methods | Synthetic-NeRF | Mip-NeRF360 | ||||||
|---|---|---|---|---|---|---|---|---|
| PSNR↑ | SSIM↑ | LPIPS↓ | Size(MB)↓ | PSNR↑ | SSIM↑ | LPIPS↓ | Size(MB)↓ | |
| 3DGS | 33.80 | 0.970 | 0.031 | 68.46 | 27.46 | 0.812 | 0.222 | 750.9 |
| Scaffold-GS | 33.41 | 0.966 | 0.035 | 19.36 | 27.50 | 0.806 | 0.252 | 253.9 |
| HAC (Previous) | 33.71 | 0.968 | 0.034 | 1.86 | 27.77 | 0.811 | 0.230 | 21.87 |
| Ours HAC++ | 33.76 | 0.969 | 0.033 | 1.84 | 27.82 | 0.811 | 0.231 | 18.48 |
Analysis: On the Mip-NeRF360 dataset, reduces the model size from 750.9 MB to 18.48 MB—a roughly 40x reduction over vanilla 3DGS, while actually increasing the PSNR from 27.46 to 27.82.
6.2. Ablation Studies
The authors proved that every part of their model works:
-
Without AQM: PSNR drops significantly because the rounding is too aggressive.
-
Without HAC (Hash context): The file size increases by 63.3% because the probability prediction is less accurate.
-
Without Masking: The model size increases by 31.4% because it keeps too many useless Gaussians.
7. Conclusion & Reflections
7.1. Conclusion Summary
represents a significant milestone in 3D scene compression. By successfully bridging structured hash grids with unstructured Gaussian anchors, the authors demonstrate that spatial context is the key to high-ratio compression. Achieving 100x compression on average makes high-quality 3D scenes viable for mobile devices and web streaming.
7.2. Limitations & Future Work
- Training Time: takes about 80% longer to train than
Scaffold-GSdue to the complex probability modeling and loss functions. - Complexity: The multi-stage training (initial fitting, noise adaptation, context training) is somewhat complex to implement.
- Future Direction: The authors suggest looking into direct anchor-to-anchor relationships without the intermediate hash grid to further reduce complexity.
7.3. Personal Insights & Critique
The most brilliant aspect of this paper is the move from "storing values" to "predicting distributions." By using the GMM to fuse inter-anchor (spatial) and intra-anchor (internal) clues, the authors have created a very robust probability engine.
However, one might critique the reliance on Scaffold-GS as the base. While Scaffold-GS is efficient, inherits its specific artifacts. Furthermore, while the rendering FPS is high, the "coding time" (compressing/decompressing) is around 10-30 seconds, which might be a barrier for real-time interactive applications where scenes need to be updated on the fly. Overall, sets a high bar for future research in 3D representation compression.
Similar papers
Recommended via semantic vector search.