DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption
TL;DR Summary
DaCapo automates bootstrapping in fully homomorphic encryption by analyzing ciphertext lifetimes and scale, selecting optimal insertion points to minimize latency. It reduces manual effort and improves performance, showing 1.21× speedup over hand-optimized implementations.
Abstract
By supporting computation on encrypted data, fully homomorphic encryption (FHE) offers the potential for privacy-preserving computation offloading. However, its applicability is constrained to small programs because each FHE multiplication increases the scale of a ciphertext with a limited scale capacity. By resetting the accumulated scale, bootstrapping enables a longer FHE multiplication chain. Nonetheless, manual bootstrapping placement poses a significant programming burden to avoid scale overflow from insufficient bootstrapping or the substantial computational overhead of unnecessary bootstrapping. Additionally, the bootstrapping placement affects costs of FHE operations due to changes in scale management, further complicating the overall management process. This work proposes DaCapo, the first automatic bootstrapping management compiler. Aiming to reduce bootstrapping counts, DaCapo analyzes live-out ciphertexts at each program point and identifies candidate points for inserting bootstrapping operations. DaCapo estimates the FHE operation latencies under different scale management scenarios for each bootstrapping placement plan at each candidate point, and decides the bootstrapping placement plan with minimal latency. This work evaluates DaCapo with deep learning models that existing FHE compilers cannot compile due to a lack of bootstrapping support. The evaluation achieves 1.21× speedup on average compared to manually implemented FHE programs.
English Analysis
1. Bibliographic Information
- Title: DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption
- Authors: Seonyoung Cheon, Yongwoo Lee, Dongkwan Kim, Ju Min Lee (Yonsei University); Sunchul Jung, Taekyung Kim (CryptoLab. Inc); Dongyoon Lee (Stony Brook University); Hanjun Kim (Yonsei University).
- Journal/Conference: Published at the USENIX Security Symposium 2024. USENIX Security is a top-tier, highly competitive, and prestigious conference in the field of computer security and systems, indicating the work's high quality and impact.
- Publication Year: 2024
- Abstract: The paper addresses a major bottleneck in Fully Homomorphic Encryption (FHE): the limited computational depth caused by ciphertext "scale" growth during multiplications. While "bootstrapping" can reset this scale to enable deeper computations, manually placing these operations is difficult, error-prone, and often inefficient. The authors propose
DaCapo
, the first compiler to automate bootstrapping management.DaCapo
uses a novel analysis to identify optimal points for inserting bootstraps by analyzing live ciphertexts and estimating the performance impact of different placement strategies. When evaluated on complex deep learning models—which existing FHE compilers cannot handle—DaCapo
demonstrates an average speedup of 1.21× compared to manually optimized FHE programs. - Original Source Link: https://www.usenix.org/system/files/sec24summer-prepub-336-cheon.pdf (The link points to a pre-publication version of the formally published paper).
2. Executive Summary
- Background & Motivation (Why):
- Core Problem: Fully Homomorphic Encryption (FHE) allows performing computations directly on encrypted data, a transformative capability for privacy. However, a popular FHE scheme for machine learning, RNS-CKKS, suffers from a practical limitation. Each multiplication increases a value called the "accumulated scale" within a ciphertext. This scale has a fixed capacity (the
coefficient modulus
). If the scale exceeds this capacity, an overflow occurs, and the encrypted data becomes unrecoverable. - Existing Solution & Its Flaws: An operation called bootstrapping can "refresh" a ciphertext, resetting its scale and allowing for more multiplications. The problem is that bootstrapping is extremely slow—orders of magnitude slower than other operations. Programmers must manually insert bootstraps, facing a difficult trade-off: use too few, and the program fails due to scale overflow; use too many, and the program becomes unacceptably slow. This manual process is a major barrier to developing large-scale FHE applications.
- Gap in Prior Work: Existing FHE compilers can automate other parts of the process but lack support for automatic bootstrapping. This has restricted their use to small programs with shallow multiplicative depth, like simple neural networks.
- Core Problem: Fully Homomorphic Encryption (FHE) allows performing computations directly on encrypted data, a transformative capability for privacy. However, a popular FHE scheme for machine learning, RNS-CKKS, suffers from a practical limitation. Each multiplication increases a value called the "accumulated scale" within a ciphertext. This scale has a fixed capacity (the
- Main Contributions / Findings (What):
The paper introduces
DaCapo
, the first compiler that automatically and efficiently manages bootstrapping placement. Its key contributions are:- First Automatic Bootstrapping Compiler:
DaCapo
is the first end-to-end tool that takes a high-level program and automatically inserts bootstrapping operations, making FHE accessible for complex applications like deep neural networks. - Liveness-Aware Candidate Selection: To minimize the number of slow bootstrapping operations,
DaCapo
analyzes the program's data flow. It identifies candidate insertion points where the number of "live-out" ciphertexts (those still needed for future computations) is minimal. It further refines this with a bypass edge analysis to ignore long-lived ciphertexts that do not need immediate refreshing. - Cost-Aware Placement Strategy:
DaCapo
recognizes that minimizing the number of bootstraps is not always the fastest solution. A bootstrap operation resets a ciphertext's "level," which can make subsequent arithmetic operations faster.DaCapo
models this trade-off by estimating the total program latency for various placement plans and selecting the one with the lowest overall cost. - Demonstrated Practicality: The authors show that
DaCapo
can successfully compile and run large deep learning models (ResNet, VGG16, AlexNet, etc.) that were previously out of reach for automated FHE compilation. The resulting programs are on average 1.21× faster than expert-tuned, manually implemented versions.
- First Automatic Bootstrapping Compiler:
3. Prerequisite Knowledge & Related Work
To understand this paper, one must be familiar with several concepts from cryptography and compilers.
-
Foundational Concepts:
- Fully Homomorphic Encryption (FHE): A revolutionary form of encryption that allows a third party (e.g., a cloud server) to perform arbitrary computations on encrypted data without ever decrypting it. The result of the computation, when decrypted by the data owner, is identical to what it would have been if the computation were performed on the original, unencrypted data.
- RNS-CKKS: The Cheon-Kim-Kim-Song (CKKS) scheme is a specific type of FHE optimized for approximate arithmetic on real numbers. The Residue Number System (RNS) variant, RNS-CKKS, is particularly efficient. Its support for fixed-point numbers and SIMD (Single Instruction, Multiple Data) operations makes it a go-to choice for privacy-preserving machine learning.
- Key Ciphertext Properties in RNS-CKKS:
- Coefficient Modulus (): This is the maximum size of coefficients in the polynomials that represent encrypted data. It defines the "scale capacity" of a ciphertext. If the internal scale grows beyond this, the data is corrupted. In the paper's evaluation, .
- Scale (): To encrypt a real number (e.g., 3.14), it is first scaled into a large integer (e.g., ). This scaling factor is the
scale
. When two encrypted numbers are multiplied, their scales are also multiplied, causing theaccumulated scale
to grow rapidly. - Level (): The
coefficient modulus
is composed of a chain of smaller prime numbers. Thelevel
of a ciphertext represents how many of these primes are still available for computation. An operation calledrescale
is used after multiplication to reduce the scale, but it consumes onelevel
. When a ciphertext runs out of levels, no more multiplications can be performed on it.
- Bootstrapping: This is a "re-encryption" procedure that takes a noisy, low-level ciphertext and produces a fresh, high-level one. It effectively resets the
accumulated scale
and restores thelevel
to its initial maximum, enabling virtually unlimited computational depth. However, it is the most computationally expensive operation in FHE. - Compiler Liveness Analysis: A standard compiler optimization technique. A variable is "live" at a specific point in a program if its current value might be used in the future.
DaCapo
uses this to count how many ciphertexts need to be carried forward at each program point.
-
Previous Works & Differentiation:
- FHE Libraries (
HEaaN
,SEAL
,PALISADE
): These are low-level libraries that provide the cryptographic primitives (e.g., encryption, addition, multiplication, bootstrapping). Programmers use these libraries to manually build FHE applications.DaCapo
automates this manual process. - FHE Compilers (
EVA
,Hecate
): These compilers were a significant step forward, automatingscale management
(insertingrescale
andmodswitch
operations). However, they do not support bootstrapping. This fundamental limitation means they can only compile programs with a small, fixed number of sequential multiplications. - Manual Bootstrapping: The state-of-the-art for complex applications was manual placement, as seen in works implementing ResNet on FHE. This is extremely challenging, time-consuming, and often leads to suboptimal performance.
DaCapo
's Innovation:DaCapo
's novelty lies in being the first compiler to automate bootstrapping. It bridges the gap left by previous compilers, enabling the practical application of FHE to complex, real-world programs that were previously intractable.
- FHE Libraries (
4. Methodology (Core Technology & Implementation)
DaCapo
works in a two-stage pipeline to find the optimal bootstrapping plan.
该图像是一张示意图,展示了DaCapo自动靴磨机制的流程和不同靴磨插入点的代价估算。图中包含了输入程序、候选点选择、代价估算及最终靴磨插入的步骤,用箭头和数值标识了计算成本和流程。该图为理解自动靴磨管理策略提供直观支持。
As shown in the diagram above (corresponding to Figure 3 in the paper), the process involves:
- Candidate Selector: Identifies a set of promising program points where bootstrapping could be inserted.
- Bootstrapping Planner: Uses dynamic programming to select the best combination of points from this set to minimize total execution time.
6.1 Bootstrapping Candidate Selector
The goal here is to find locations that are efficient for bootstrapping. The intuition is to bootstrap at "bottlenecks" in the program's dataflow graph where the number of active ciphertexts is low. This is achieved through a three-step analysis detailed in Algorithm 1.
-
Liveness Analysis:
DaCapo
first performs a standard liveness analysis to determine, at every program point, which ciphertexts (variables) are "live-out"—meaning their values are needed for subsequent operations. Bootstrapping at a point with few live-outs is cheaper because fewer ciphertexts need to undergo the expensive bootstrapping operation. -
Bypass Edge Analysis: This is a key refinement. A ciphertext might be "live" but not actually used for a long sequence of computations. For example, an output from an early layer of a neural network might be passed directly to a final addition layer, "bypassing" many intermediate layers. Such a "bypass edge" does not accumulate scale during the bypassed computations and likely does not need bootstrapping. The algorithm identifies these edges by checking if a live variable remains unused while other computations accumulate scale beyond a certain threshold (). By excluding these bypass edges from the live-out count,
DaCapo
gets a more accurate measure of which ciphertexts genuinely require bootstrapping at a given point. -
Candidate Filtering: The algorithm groups candidate points by their corrected live-out count (1, 2, 3, etc.). It starts by considering only the most efficient candidates (those with a live-out count of 1). It checks if a valid program (i.e., one without scale overflows) can be created using only these points. If not, it expands the candidate set by including points with a live-out count of 2, and so on, until a valid placement is possible. This ensures correctness while prioritizing the most efficient insertion points.
6.2 Cost-aware Bootstrapping Planner
Once a set of valid candidate points is selected, the planner's job is to choose the final placement plan. A naive approach might be to simply insert bootstraps as late as possible to minimize their count. However, DaCapo
employs a more sophisticated, cost-aware strategy based on two key observations:
-
Bootstrapping resets a ciphertext's level, which can make subsequent operations significantly faster (as shown in Table 2, lower-level operations are slower).
-
Therefore, inserting more bootstraps can sometimes decrease total latency if the speedup in arithmetic operations outweighs the cost of the extra bootstraps.
The planner uses a dynamic programming approach, described in Algorithm 2, to solve this optimization problem.
-
The Algorithm: The planner iterates through the candidate points (
to
) in program order. For eachto
, it calculates the minimum possible execution cost from the start of the program up to that point. To do this, it considers every possible previous bootstrapping point (from
). It computes the cost of the program segment betweenfrom
andto
(assuming a bootstrap atfrom
and another atto
) and adds it to the already-computed minimum cost to reachfrom
. It stores the plan (from
->to
) that yields the lowest total cost for reachingto
. -
Cost Estimation: The cost of each segment is estimated by summing the profiled latencies of all FHE operations within it. The planner runs a simulated scale management pass for each potential segment to determine the
level
of each operation, using a pre-profiled look-up table (like Table 2) to get its latency.By repeating this process until the end of the program, the planner finds the sequence of bootstrapping points that results in the minimum overall estimated latency.
5. Experimental Setup
- Datasets: The evaluation uses the CIFAR-10 image classification dataset.
- Evaluation Metrics:
- Latency / Speedup: The primary performance metric is the end-to-end inference time for a single image. Speedup is calculated as the ratio of the baseline's latency to the proposed method's latency.
- Classification Accuracy: Used to verify that the FHE computations are correct and do not degrade the model's predictive power. The formula is:
- Baselines:
Manual
: A strong baseline representing the state-of-the-art, where an expert manually places bootstraps according to best practices from prior research [36].Liveness
: An ablation ofDaCapo
that uses only liveness analysis, without bypass edges or cost-awareness.Bypass
: An ablation that uses liveness and bypass edge analysis but greedily inserts bootstraps just before overflow, without cost-awareness.DaCapo
: The full proposed system with all optimizations.
- Models and Implementation:
- The experiments use a suite of deep learning models: ResNet-20/44, AlexNet, VGG16, SqueezeNet, and MobileNet.
- Two variants of each model are tested: one with
ReLU
activation and max pooling (which requires deeper multiplicative circuits) and another withSiLU
activation and average pooling (shallower). - The models are implemented using the GPU-accelerated
HEaaN
library on an NVIDIA GeForce RTX 3090 GPU.
6. Results & Analysis
Core Results
-
Correctness: The evaluation first confirms that
DaCapo
-compiled models maintain accuracy. The following table, transcribed from Table 3 of the paper, shows a negligible drop in accuracy compared to the unencrypted PyTorch versions.(Manual transcription of Table 3)
Model ResNet-20 ResNet-44 AlexNet VGG16 SqueezeNet MobileNet ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU Pytorch 90.6% 92.6% 91.2% 92.7% 89.0% 86.6% 93.8% 93.0% 89.4% 88.0% 91.4% 91.4% DACAPO 90.7% 92.6% 91.3% 92.7% 89.0% 86.6% 93.8% 92.9% 89.2% 87.9% 91.2% 91.4% -
Performance Speedup:
DaCapo
significantly outperforms theManual
baseline.该图像是一个条形图,展示了不同FHE编译器和优化策略在多个深度学习模型上的加速比对比。图中包括手动方式、Liveness、Bypass和DaCapo四种方案,DaCapo在大多数模型中实现了最高加速比,平均加速比达1.21倍。
As shown in the chart above (Figure 4),
DaCapo
achieves an average speedup of 1.21×. The intermediate versions,Liveness
andBypass
, show incremental improvements, validating the contribution of each component of theDaCapo
framework. -
Cost-Awareness vs. Bootstrap Count: The results demonstrate that minimizing the number of bootstraps is not always optimal.
(Manual transcription of Table 4)
Model ResNet-20 ResNet-44 AlexNet VGG16 SqueezeNet MobileNet ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU Manual 37 19 85 43 66 12 54 20 58 18 53 27 DACAPO 37 19 85 43 67 12 53 20 57 19 61 27 The table above shows the number of bootstraps used. For
MobileNet(R)
,DaCapo
uses more bootstraps (61) than theManual
baseline (53). However, as seen in the latency breakdown chart below (Figure 5),DaCapo
is still faster overall.该图像是一个柱状图,展示了不同深度学习模型在多种FHE编译策略下的延迟时间对比,涵盖Bootstrap time、Manual、Liveness、Bypass和DaCapo五种方法,其结果表明DaCapo在大多数模型中实现了延迟下降。
This is because the extra bootstraps allowed the other arithmetic operations to run at lower, faster levels, and this performance gain more than compensated for the overhead of the additional bootstraps. This provides strong evidence for the effectiveness of
DaCapo
's cost-aware planning. -
Accuracy of Performance Estimation: The cost model used by
DaCapo
's planner is shown to be highly accurate.该图像是多子图的折线图,展示了不同FHE编译器策略(Manual, Liveness, Bypass, DaCapo)在不同神经网络模型和激活函数(ReLU和SiLU)下,对延迟(秒)随Waterline变化的影响,直观比较了DaCapo方法的性能优势。
Figure 6 in the paper (shown above) plots the estimated latency against the actual measured latency. The strong linear correlation () confirms that the cost model is a reliable foundation for the optimization decisions made by the planner.
(Note: The provided text for the paper was incomplete, ending during Section 8.3. Therefore, analysis of compilation time (Section 8.4) and sensitivity to the waterline
parameter (Section 8.6, Figure 7 in the paper) cannot be fully detailed, though the paper's introduction to the evaluation mentions these analyses were performed.)
7. Conclusion & Reflections
-
Conclusion Summary:
DaCapo
represents a landmark achievement in making Fully Homomorphic Encryption practical for complex applications. By creating the first automatic bootstrapping compiler, the authors have removed a significant and error-prone burden from FHE programmers. The novel combination of liveness-aware candidate selection and cost-aware latency planning allowsDaCapo
to generate FHE programs that are not only correct but also more performant than those tuned by human experts. This work significantly lowers the barrier to entry for developing large-scale, privacy-preserving services. -
Limitations & Future Work: Based on the presented methodology, potential limitations and future directions could include:
- Hardware Dependency: The cost model relies on latencies profiled on specific hardware (NVIDIA RTX 3090). Porting to different GPUs or CPUs would require re-profiling. A more abstract, portable cost model could be a direction for future work.
- Heuristic Threshold: The
S_th
threshold used in bypass edge analysis is set heuristically. An adaptive or learned approach to setting this value might yield further performance gains. - Synchronous Bootstrapping:
DaCapo
bootstraps all non-bypass live-out ciphertexts at a given point simultaneously. While this simplifies level management, exploring asynchronous or partial bootstrapping could offer more fine-grained optimization opportunities in certain scenarios.
-
Personal Insights & Critique: This paper is a superb example of systems-level research that tackles a critical, practical problem in an emerging technology.
- Impact: The primary impact of
DaCapo
is one of enablement. By automating the most difficult aspect of FHE programming, it opens the door for developers and researchers in other fields (like machine learning) to experiment with and deploy privacy-preserving solutions without needing to become cryptography experts. - Novelty: The true novelty lies in the holistic approach. While liveness analysis and dynamic programming are standard techniques, their application to the unique constraints of FHE (scale, level, and the extreme cost of bootstrapping) is highly innovative. The insight that more bootstraps can lead to better performance is counter-intuitive and a cornerstone of the work's success.
- Overall:
DaCapo
is a significant step toward bridging the gap between the theoretical promise of FHE and its real-world application. It is a well-engineered and rigorously evaluated system that solves a clear and important problem.
- Impact: The primary impact of
Similar papers
Recommended via semantic vector search.
Discussion
Leave a comment
No comments yet. Start the discussion!