Paper status: completed

Alternating optimization for bundle adjustment with closed form solutions

Published:05/15/2025

Alternating Optimization for Bundle Adjustment (1)Inverse Depth Representation for 3D Points (1)Re-weighted Re-projection Error (1)Closed-Form Camera Pose Estimation (1)Linearized 3D Structure Estimation (1)

Original Link

Price: 0.100000

2 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper introduces a novel alternating optimization algorithm for Bundle Adjustment, augmenting 3D points with inverse depth to enable closed-form solutions for decoupled camera pose and 3D structure subproblems. This method proves efficient, reliable, and accurate, outperform

Abstract

SCIENCE CHINA Information Sciences June 2025, Vol. 68, Iss. 6, 162103:1–162103: 15 https://doi.org/10.1007/s11432-023-4281-3 c © Science China Press 2025 info.scichina.com link.springer.com . RESEARCH PAPER . Alternating optimization for bundle adjustment with closed form solutions Chengzhe MENG, Yiwen JIANG & Weiwei XU * State Key Lab of Computer-Aided Design & Computer Graphics, Zhejiang University, Hangzhou 310058, China Received 11 December 2023/Revised 18 March 2024/Accepted 20 May 2024/Published online 15 May 2025 Abstract This study proposes a novel alternating optimization algorithm for bundle adjustment, a critical process in structure from motion methods. We introduce the inverse depth of each three-dimensional (3D) point as an augmented independent variable and develop a low-order polynomial error metric. Theoretically, the error can be adjusted to align closely with the re-projection error since it essentially acts as a re-weighted re-projection error. We decouple the bundle adjustment problem by breaking it down into three separate tasks: estimating the camera’s pose, determining the 3D structure, and optionally estimating the camera’s

Mind Map

In-depth Reading

English Analysis~15 min read · 15,899 chars

1. Bibliographic Information

Title: Alternating optimization for bundle adjustment with closed form solutions
Authors: Chengzhe MENG, Yiwen JIANG & Weiwei XU
Affiliations: State Key Lab of Computer-Aided Design & Computer Graphics, Zhejiang University, Hangzhou, China.
Journal/Conference: Science China Information Sciences. This is a reputable journal covering a broad range of topics in information sciences and technology.
Publication Year: Published online May 15, 2024, for the 2025 volume.
Abstract: The paper introduces a new alternating optimization (AO) algorithm for bundle adjustment (BA). The key idea is to introduce the inverse depth of 3D points as an additional variable, which simplifies the error metric into a low-order polynomial. This new error metric is shown to be a re-weighted version of the standard re-projection error. The algorithm decouples the BA problem into three subproblems: camera pose estimation, 3D structure refinement, and optional camera intrinsic parameter estimation. Each subproblem can be solved independently and has a closed-form solution. Camera pose is solved as an absolute orientation problem. 3D points are refined using a linearization scheme that also yields a closed-form update. The authors claim their algorithm is efficient, reliable, and accurate, outperforming recent alternatives in experiments.
Original Source Link: /files/papers/68e36fca00f1b238a31a19e6/paper.pdf (Published paper).

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Bundle Adjustment (BA) is a fundamental optimization problem in 3D computer vision, used in applications like Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM). It aims to simultaneously refine 3D point locations and camera poses to minimize the difference between observed image features and their projections from the 3D model (the re-projection error).
- Existing Challenges: Traditional BA methods, like Gauss-Newton, are computationally expensive, especially for large-scale problems (e.g., city-scale reconstruction), because they require solving a massive system of linear equations in each iteration. While techniques like the Schur complement or Preconditioned Conjugate Gradients (PCG) improve efficiency, they can still be slow or compromise accuracy. Distributed methods like ADMM are designed for large-scale problems but often suffer from slow convergence rates.
- Paper's Innovation: This paper proposes a novel approach that avoids solving a large, complex system of equations altogether. By introducing an augmented variable (inverse depth) and reformulating the error metric, the authors break the BA problem down into smaller, independent subproblems. Crucially, each of these subproblems can be solved optimally and directly using a closed-form solution, which is computationally very fast and stable.
Main Contributions / Findings (What):
1. A Novel Error Metric: The paper introduces a modified re-projection error by using the inverse depth of each 3D point as an augmented variable. This transforms the complex fractional error function into a simple low-order polynomial, making optimization significantly easier.
2. Decoupled Optimization with Closed-Form Solutions: The BA problem is split into three alternating tasks: refining camera poses, refining 3D points, and refining camera intrinsics. Each task has a closed-form solution, eliminating the need for iterative numerical solvers for the subproblems.
3. An Efficient 3D Point Refinement Scheme: A linearization technique is proposed to find the update direction for 3D points and their inverse depths. This involves solving only a tiny 3x3 linear system, followed by a closed-form line search to find the optimal step size.
4. High Performance and Scalability: The proposed algorithm, named Alternating BA (ABA), is shown to be highly efficient in terms of both computation time and memory usage. It is also inherently parallelizable and suitable for distributed computation, making it effective for very large-scale BA problems.

Foundational Concepts:
- Bundle Adjustment (BA): Imagine taking many photos of a building from different angles. BA is the process of fine-tuning both the 3D positions of key points on the building (like corners) and the exact position and orientation (pose) of the camera for each photo. The goal is to make the 3D model perfectly align with all the photos, minimizing the re-projection error.
- Re-projection Error: This is the distance, measured in pixels on an image, between where a 2D feature was actually observed and where the corresponding 3D point from the model projects to. Minimizing this error is the primary goal of BA.
- Structure from Motion (SfM): A pipeline that automatically creates a 3D model from a collection of 2D images. BA is the final and most critical step of SfM.
- Simultaneous Localization and Mapping (SLAM): A technique used by robots and autonomous vehicles to build a map of an unknown environment while simultaneously tracking their own position within that map. BA is used in the backend to refine the map and trajectory.
- Alternating Optimization (AO): An optimization strategy for problems with multiple groups of variables. Instead of optimizing everything at once, it iteratively fixes one group of variables and optimizes the others, then swaps roles. The paper applies this to camera poses and 3D points.
- Closed-Form Solution: A mathematical solution that can be expressed in a finite number of standard operations. It's a direct formula, not an iterative approximation. Finding closed-form solutions is highly desirable as they are fast and exact.
Previous Works & Differentiation:
- Traditional (Gauss-Newton) Methods: These are the standard for BA. They linearize the problem and solve a large normal equation system. Techniques like the Schur complement exploit the problem's structure to reduce the system size, but it can still be a bottleneck. The proposed method avoids forming and solving this large system.
- Inexact Methods (PCG): These methods, like Multicore (MBA), approximate the solution to the normal equation iteratively. They are fast but may not achieve the same level of accuracy as exact methods.
- ADMM-based Methods: These are designed for distributed BA. They split the problem among multiple machines, but require extra steps to ensure all machines agree on shared variables (consensus), which can slow down convergence. The proposed method's distributed approach is more direct, simply aggregating intermediate results (A_p and y_p) without a complex consensus mechanism.
- Error Measurement: Previous work has explored simplifying the error metric. The "space error" moves the depth term out of the denominator but introduces a bias by weighting points based on their depth. This paper's use of inverse depth as an independent variable is novel in this context. It creates a polynomial error function that is closely related to the original re-projection error via a simple cosine factor, preserving accuracy while simplifying optimization.

4. Methodology (Core Technology & Implementation)

The core of the paper is a reformulation of the BA objective function that permits decomposition into subproblems with closed-form solutions.

Principles: The Modified Re-projection Error The standard re-projection of a 3D point $\pmb{p}_i$ onto the $j$ -th image plane is complex due to the division by depth. The traditional re-projection error is: $e_{i,j}^r = \left\| \pmb{v}_{i,j} - (\pmb{R}_j \pmb{p}_i + \pmb{t}_j) \frac{f_k}{\pmb{p}_i^\mathrm{T} \pmb{r}_j^3 + \pmb{t}_j^3} \right\|_2$ where $\pmb{v}_{i,j}$ is the 2D observation, $(\pmb{R}_j, \pmb{t}_j)$ is the camera pose, $f_k$ is the focal length, and the denominator is the point's depth.

The key innovation is to treat the scaling factor as a new, independent variable $s_{i,j}$ : $s_{i,j} \approx \frac{f_k}{\text{depth}}$ This leads to the modified re-projection error: $e_{i,j} = \| \pmb{v}_{i,j} - ( R_j \pmb{p}_i + \pmb{t}_j ) s_{i,j} \|_2$ This is a simple quadratic function of its variables, with no division.
Geometric Interpretation and Accuracy As shown in Figure 1, this new error $e_{i,j}$ is geometrically related to the original re-projection error $e_{i,j}^r$ .

该图像为示意图，展示了相机光轴、图像平面与点在三维空间中的投影关系。图中标示了实际投影点 $\mathbf{q}_{i,j}$ 、估计投影点 $\hat{\mathbf{q}}_{i,j}$ 、误差向量 $\mathbf{e}_{i,j}$ 、及其在图像平面上的分量 $\mathbf{e}^r_{i,j}$ ，还包含角度 $\alpha$ 和相机内参焦距 $f_k$ 等关键几何参数，表达了重投影误差及其分解方式。

The term $(\pmb{R}_j \pmb{p}_i + \pmb{t}_j)s_{i,j}$ finds a point $\hat{\pmb{q}}_{i,j}$ on the ray from the camera center $O_j$ to the 3D point $\pmb{q}_{i,j}$ that is closest to the observation $\pmb{v}_{i,j}$ . The geometry reveals that the modified error is a scaled version of the original error: $e_{i,j} = e_{i,j}^r \cos(\alpha_{i,j})$ where $\alpha_{i,j}$ is the angle between the camera's optical axis and the observation ray. To compensate for this scaling, the authors introduce a re-weighting factor $\phi_{i,j} \approx 1/\cos(\alpha_{i,j})$ , leading to two variants:
- V-metric: $\phi_{i,j} = 1$ (unweighted).
- Z-metric: $\phi_{i,j} = \frac{\|\pmb{v}_{i,j}\|}{f_k}$ (approximates the cosine correction).
  
  The final objective function to minimize is: $J = \sum_{i,j \in \Omega} \mu_{i,j} (\phi_{i,j} e_{i,j})^2$ where $\mu_{i,j}$ is a weight for robustness (e.g., to handle outliers).
Steps & Procedures: Alternating Optimization The algorithm alternates between three main steps until convergence.

1. Camera Pose Refinement:
- Goal: For each camera $j$ , find the optimal rotation $\pmb{R}_j$ and translation $\pmb{t}_j$ , assuming all 3D points $\pmb{p}_i$ and inverse depths $s_{i,j}$ are fixed.
- Procedure:
  1. The objective function for camera $j$ is $J_j(\pmb{R}, \pmb{t}) = \sum_i \mu_i \| \pmb{v}_i - (\pmb{R}\pmb{p}_i + \pmb{t})s_i \|_2^2$ .
  2. First, solve for the optimal translation $\pmb{t}^*$ by setting the derivative $\frac{\partial J_j}{\partial \pmb{t}}$ to zero. This yields a closed-form solution for $\pmb{t}$ that depends on $\pmb{R}$ .
  3. Substitute $\pmb{t}^*$ back into $J_j$ . The problem simplifies to minimizing $\sum_i \mu_i \| \pmb{x}_i - \pmb{R}\pmb{y}_i \|_2^2$ , where $\pmb{x}_i$ and $\pmb{y}_i$ are known vectors derived from the fixed variables.
  4. This is a classic absolute orientation problem, which can be solved globally and in closed form for the optimal rotation $\pmb{R}$ using Singular Value Decomposition (SVD).
2. 3D Point Refinement:
- Goal: For each 3D point $i$ , find the optimal position $\pmb{p}_i$ and its corresponding inverse depths $s_{i,j}$ for all cameras $j$ that see it, assuming all camera poses are fixed.
- Procedure:
  1. The objective function $J_i(\pmb{p}, s_j)$ contains products of variables ( $\pmb{p}$ and $s_j$ ), making it non-linear.
  2. Linearization: The authors linearize the problem by considering small updates $\delta\pmb{p}$ and $\delta s_j$ . They replace $\pmb{p}$ with $\pmb{p} + \delta\pmb{p}$ and $s_j$ with $s_j + \delta s_j$ and drop the second-order term $\delta\pmb{p}\delta s_j$ . This results in a linear least-squares problem: $\bar{J}_i(\delta\pmb{p}, \delta s_j) = \sum_{j \in \Omega_i} \mu_j \| \hat{\pmb{e}}_j - \delta\pmb{p}s_j - \hat{\pmb{q}}_j \delta s_j \|_2^2$
  3. This linearized problem also has a closed-form solution. First, solve for $\delta s_j$ in terms of $\delta\pmb{p}$ .
  4. Substitute this back to get an objective purely in terms of $\delta\pmb{p}$ . The optimal update direction $\delta\pmb{p}^*$ is found by solving a small 3x3 linear system: $\delta\pmb{p}^* = \pmb{A}_p^{-1} \pmb{y}_p$ where $\pmb{A}_p = \sum_j \pmb{A}_j s_j^2 \mu_j$ and $\pmb{y}_p = \sum_j \pmb{A}_j \hat{\pmb{e}}_j \mu_j$ . Note that $\pmb{A}_p$ and $\pmb{y}_p$ are sums, making them easy to compute in a distributed manner.
  5. Line Search: The direction $(\delta\pmb{p}^*, \delta s_j^*)$ is a guaranteed descent direction for the original non-linear objective $J_i$ . The authors perform a line search to find the optimal step size $h^*$ along this direction. Since $J_i(h)$ is a 4th-order polynomial in $h$ , its minimum can be found in closed form by solving a cubic equation.
  6. Finally, update the variables: $\pmb{p} \leftarrow \pmb{p} + h^*\delta\pmb{p}^*$ and $s_j \leftarrow s_j + h^*\delta s_j^*$ .
3. Camera Intrinsic Parameter Update (Optional):
- Goal: Refine camera intrinsics like focal length $f$ and radial distortion coefficients, assuming camera poses and 3D points are fixed.
- Procedure: The objective functions for focal length and distortion coefficients are also quadratic and can be minimized with closed-form solutions by solving simple linear systems.

5. Experimental Setup

Datasets:
- ETHBA: A dataset created from the ETH3D SLAM benchmark, providing ground truth for accurate evaluation of small-to-medium scale problems.
- ASBA: A large-scale synthetic dataset generated using the AirSim simulator, featuring a city environment. It also has ground truth and is used for evaluating accuracy, robustness, and distributed performance on large problems.
- BAL: A standard, large-scale real-world BA benchmark dataset with significant outliers but no ground truth. Used to evaluate efficiency and robustness.
- EuRoC: A popular dataset for visual-inertial SLAM, used to test the algorithm's applicability in a real-time SLAM system.
Evaluation Metrics:
- Translation Error: Euclidean distance between the estimated and ground-truth camera positions.
- Rotation Error: The angle of the relative rotation between the estimated and ground-truth camera orientations.
Baselines:
- CeresBA: A widely-used, high-quality BA solver from Google, based on the Levenberg-Marquardt algorithm.
- Multicore (MBA): An efficient PCG-based BA solver.
- RootBA: A memory-efficient BA solver that uses a nullspace projection method.
- ORB-SLAM3: A state-of-the-art SLAM system; its BA module is used as a baseline for the SLAM experiment.

6. Results & Analysis

The paper's proposed method is referred to as ABA (ABAv for V-metric, ABAz for Z-metric).

Core Results:
- Accuracy: On datasets with ground truth (ETHBA and ASBA), ABA consistently achieves the highest accuracy, outperforming CeresBA and MBA (Table 2). This validates that the modified error metric does not sacrifice accuracy. Image 4 shows the convergence curves, where ABA reaches a lower error faster than competitors.
  
  该图像为多组折线图，展示了不同算法（ABAv、ABAz、CeresBA、MBA）在三个数据集（cables1、table7、repetitive）上的平移误差和旋转误差随时间变化的趋势。图中横轴为时间（毫秒），纵轴分别为平移误差（毫米）和旋转误差，显示不同方法的收敛速度和精度差异。总体来看，ABAv和ABAz算法在误差降低速度和误差值上表现更优。
- Efficiency: The convergence plots in Image 4 and Image 6 show that ABA is significantly faster than CeresBA and competitive with, or faster than, the highly optimized MBA and RootBA. For the very large "Final" problem from the BAL dataset, only ABA could run on the test machine due to its superior memory efficiency (using ~4 GB on an 8 GB machine).
  
  该图像为多子图折线图，展示了不同算法（ABA、RootBA、CeresBA、MBA）在五个数据集（Trafalgar、Dubrovnil、Ladybug、Venice、Final）上误差对时间的变化关系。纵轴为误差的对数log(e)，横轴为时间（秒）。图中可见ABA和MBA算法普遍收敛速度较快且误差较低，尤其是ABA表现稳定且效率高。CeresBA在部分数据集起初误差较大，收敛慢。Final图仅展示ABA算法，误差随时间持续减小。整体体现了论文中新算法ABA效率和准确性的优势。
- Robustness: On the BAL dataset, which contains outliers, ABA demonstrates stable convergence (see Image 6). Unlike methods like CeresBA, its objective function value strictly decreases in every iteration, thanks to the guaranteed descent direction and exact line search. The successful reconstruction of the ASBA dataset from a poor initialization further demonstrates its reliability (Image 7).
  
  该图像为三维点云重建的示意图，展示了通过bundle adjustment算法获得的三维建筑物模型及其点云分布。图中黑色点表示三维重建点，红色线条可能表示相机轨迹或视角边界，体现了算法在结构光束法中对建筑环境的高精度重建效果。
- SLAM Application: When integrated into ORB-SLAM3, the ABA solver successfully completed the SLAM tasks with comparable accuracy to the original highly-tuned solver (Table 3), proving its suitability for real-time applications.
Distributed BA: The distributed experiment on the ASBA dataset was a success.
- Workflow: Image 1 shows the workflow. Each machine updates its local camera poses. For shared 3D points, each machine computes its local contribution to $\pmb{A}_p$ and $\pmb{y}_p$ and sends them to a master machine. The master aggregates them, solves the 3x3 system for the point update, and broadcasts the result back.
  
  该图像为流程图，展示了主机（Master Machine）与辅助机（Machine B）间交替进行姿态更新、特征点采集与更新的过程。两台机器通过变量Ap, Yp和P交换信息，实现分布式的姿态与三维点估计更新。
- Performance: The convergence curve for the distributed version (Image 8) is smooth and resembles the non-distributed case, confirming the method's effectiveness. The communication overhead was minimal (19.6 MB for updates vs. >1 GB for the data itself), highlighting the efficiency of the distributed scheme.
  
  该图像为双子图表，左图显示随时间推移的平移误差变化曲线，误差迅速下降后趋于稳定，右图显示旋转误差随时间变化趋势，误差同样初期下降明显，随后达到相对稳定状态，反映了算法在优化过程中误差逐渐减小的效果。

7. Conclusion & Reflections

Conclusion Summary: The paper presents a novel and powerful alternating optimization framework for bundle adjustment. By introducing inverse depth as an augmented variable, it simplifies the objective function into a polynomial, enabling the decomposition of the problem into subproblems with fast, stable, and exact closed-form solutions. The experimental results robustly demonstrate that this approach is not only faster and more memory-efficient than state-of-the-art methods but also achieves superior accuracy. Its natural suitability for distributed computing makes it a compelling solution for large-scale 3D reconstruction.
Limitations & Future Work: The authors acknowledge that their current method is tailored for the fundamental BA problem involving point features. It is not directly applicable to "generalized" BA problems that include other types of geometric primitives, such as lines or circles. Their planned future work is to extend the framework to handle these more diverse feature types.
Personal Insights & Critique:
- The central idea of modifying the objective function to unlock closed-form solutions is elegant and highly effective. It sidesteps the main computational bottleneck of traditional BA (solving a large linear system) in a principled way.
- The guarantee of a descent direction for the 3D point update is a strong theoretical advantage that translates into the observed stable convergence, a property not always guaranteed by standard Newton-based methods without a costly line search.
- The method's low memory footprint is a significant practical advantage, making large-scale BA accessible on standard hardware.
- The distributed implementation is particularly impressive. Unlike ADMM, which often struggles with tuning penalty parameters and slow convergence to consensus, this approach is direct and requires minimal communication, making it highly practical.
- A potential question is how sensitive the method is to the initial values of the augmented variables $s_{i,j}$ . The paper suggests they are optimized along with other variables, but their initialization might influence the convergence path in the early stages. However, the strong experimental results suggest this is not a major issue in practice.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.