AiPaper
Status: completed

DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption

Automatic Bootstrapping Management for FHEFHE Performance OptimizationCompiler for Encrypted ComputationAutomatic Bootstrapping InsertionScale Management Based Latency Estimation
Original Link
Price: 0.10
3 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

DaCapo automates bootstrapping in fully homomorphic encryption by analyzing ciphertext lifetimes and scale, selecting optimal insertion points to minimize latency. It reduces manual effort and improves performance, showing 1.21× speedup over hand-optimized implementations.

Abstract

By supporting computation on encrypted data, fully homomorphic encryption (FHE) offers the potential for privacy-preserving computation offloading. However, its applicability is constrained to small programs because each FHE multiplication increases the scale of a ciphertext with a limited scale capacity. By resetting the accumulated scale, bootstrapping enables a longer FHE multiplication chain. Nonetheless, manual bootstrapping placement poses a significant programming burden to avoid scale overflow from insufficient bootstrapping or the substantial computational overhead of unnecessary bootstrapping. Additionally, the bootstrapping placement affects costs of FHE operations due to changes in scale management, further complicating the overall management process. This work proposes DaCapo, the first automatic bootstrapping management compiler. Aiming to reduce bootstrapping counts, DaCapo analyzes live-out ciphertexts at each program point and identifies candidate points for inserting bootstrapping operations. DaCapo estimates the FHE operation latencies under different scale management scenarios for each bootstrapping placement plan at each candidate point, and decides the bootstrapping placement plan with minimal latency. This work evaluates DaCapo with deep learning models that existing FHE compilers cannot compile due to a lack of bootstrapping support. The evaluation achieves 1.21× speedup on average compared to manually implemented FHE programs.

English Analysis

1. Bibliographic Information

  • Title: DaCapo: Automatic Bootstrapping Management for Efficient Fully Homomorphic Encryption
  • Authors: Seonyoung Cheon, Yongwoo Lee, Dongkwan Kim, Ju Min Lee (Yonsei University); Sunchul Jung, Taekyung Kim (CryptoLab. Inc); Dongyoon Lee (Stony Brook University); Hanjun Kim (Yonsei University).
  • Journal/Conference: Published at the USENIX Security Symposium 2024. USENIX Security is a top-tier, highly competitive, and prestigious conference in the field of computer security and systems, indicating the work's high quality and impact.
  • Publication Year: 2024
  • Abstract: The paper addresses a major bottleneck in Fully Homomorphic Encryption (FHE): the limited computational depth caused by ciphertext "scale" growth during multiplications. While "bootstrapping" can reset this scale to enable deeper computations, manually placing these operations is difficult, error-prone, and often inefficient. The authors propose DaCapo, the first compiler to automate bootstrapping management. DaCapo uses a novel analysis to identify optimal points for inserting bootstraps by analyzing live ciphertexts and estimating the performance impact of different placement strategies. When evaluated on complex deep learning models—which existing FHE compilers cannot handle—DaCapo demonstrates an average speedup of 1.21× compared to manually optimized FHE programs.
  • Original Source Link: https://www.usenix.org/system/files/sec24summer-prepub-336-cheon.pdf (The link points to a pre-publication version of the formally published paper).

2. Executive Summary

  • Background & Motivation (Why):
    • Core Problem: Fully Homomorphic Encryption (FHE) allows performing computations directly on encrypted data, a transformative capability for privacy. However, a popular FHE scheme for machine learning, RNS-CKKS, suffers from a practical limitation. Each multiplication increases a value called the "accumulated scale" within a ciphertext. This scale has a fixed capacity (the coefficient modulus). If the scale exceeds this capacity, an overflow occurs, and the encrypted data becomes unrecoverable.
    • Existing Solution & Its Flaws: An operation called bootstrapping can "refresh" a ciphertext, resetting its scale and allowing for more multiplications. The problem is that bootstrapping is extremely slow—orders of magnitude slower than other operations. Programmers must manually insert bootstraps, facing a difficult trade-off: use too few, and the program fails due to scale overflow; use too many, and the program becomes unacceptably slow. This manual process is a major barrier to developing large-scale FHE applications.
    • Gap in Prior Work: Existing FHE compilers can automate other parts of the process but lack support for automatic bootstrapping. This has restricted their use to small programs with shallow multiplicative depth, like simple neural networks.
  • Main Contributions / Findings (What): The paper introduces DaCapo, the first compiler that automatically and efficiently manages bootstrapping placement. Its key contributions are:
    1. First Automatic Bootstrapping Compiler: DaCapo is the first end-to-end tool that takes a high-level program and automatically inserts bootstrapping operations, making FHE accessible for complex applications like deep neural networks.
    2. Liveness-Aware Candidate Selection: To minimize the number of slow bootstrapping operations, DaCapo analyzes the program's data flow. It identifies candidate insertion points where the number of "live-out" ciphertexts (those still needed for future computations) is minimal. It further refines this with a bypass edge analysis to ignore long-lived ciphertexts that do not need immediate refreshing.
    3. Cost-Aware Placement Strategy: DaCapo recognizes that minimizing the number of bootstraps is not always the fastest solution. A bootstrap operation resets a ciphertext's "level," which can make subsequent arithmetic operations faster. DaCapo models this trade-off by estimating the total program latency for various placement plans and selecting the one with the lowest overall cost.
    4. Demonstrated Practicality: The authors show that DaCapo can successfully compile and run large deep learning models (ResNet, VGG16, AlexNet, etc.) that were previously out of reach for automated FHE compilation. The resulting programs are on average 1.21× faster than expert-tuned, manually implemented versions.

3. Prerequisite Knowledge & Related Work

To understand this paper, one must be familiar with several concepts from cryptography and compilers.

  • Foundational Concepts:

    • Fully Homomorphic Encryption (FHE): A revolutionary form of encryption that allows a third party (e.g., a cloud server) to perform arbitrary computations on encrypted data without ever decrypting it. The result of the computation, when decrypted by the data owner, is identical to what it would have been if the computation were performed on the original, unencrypted data.
    • RNS-CKKS: The Cheon-Kim-Kim-Song (CKKS) scheme is a specific type of FHE optimized for approximate arithmetic on real numbers. The Residue Number System (RNS) variant, RNS-CKKS, is particularly efficient. Its support for fixed-point numbers and SIMD (Single Instruction, Multiple Data) operations makes it a go-to choice for privacy-preserving machine learning.
    • Key Ciphertext Properties in RNS-CKKS:
      • Coefficient Modulus (QQ): This is the maximum size of coefficients in the polynomials that represent encrypted data. It defines the "scale capacity" of a ciphertext. If the internal scale grows beyond this, the data is corrupted. In the paper's evaluation, Q21479Q \approx 2^{1479}.
      • Scale (mm): To encrypt a real number (e.g., 3.14), it is first scaled into a large integer (e.g., 3.14×2403.14 \times 2^{40}). This scaling factor is the scale. When two encrypted numbers are multiplied, their scales are also multiplied, causing the accumulated scale to grow rapidly.
      • Level (ll): The coefficient modulus QQ is composed of a chain of smaller prime numbers. The level of a ciphertext represents how many of these primes are still available for computation. An operation called rescale is used after multiplication to reduce the scale, but it consumes one level. When a ciphertext runs out of levels, no more multiplications can be performed on it.
    • Bootstrapping: This is a "re-encryption" procedure that takes a noisy, low-level ciphertext and produces a fresh, high-level one. It effectively resets the accumulated scale and restores the level to its initial maximum, enabling virtually unlimited computational depth. However, it is the most computationally expensive operation in FHE.
    • Compiler Liveness Analysis: A standard compiler optimization technique. A variable is "live" at a specific point in a program if its current value might be used in the future. DaCapo uses this to count how many ciphertexts need to be carried forward at each program point.
  • Previous Works & Differentiation:

    • FHE Libraries (HEaaN, SEAL, PALISADE): These are low-level libraries that provide the cryptographic primitives (e.g., encryption, addition, multiplication, bootstrapping). Programmers use these libraries to manually build FHE applications. DaCapo automates this manual process.
    • FHE Compilers (EVA, Hecate): These compilers were a significant step forward, automating scale management (inserting rescale and modswitch operations). However, they do not support bootstrapping. This fundamental limitation means they can only compile programs with a small, fixed number of sequential multiplications.
    • Manual Bootstrapping: The state-of-the-art for complex applications was manual placement, as seen in works implementing ResNet on FHE. This is extremely challenging, time-consuming, and often leads to suboptimal performance.
    • DaCapo's Innovation: DaCapo's novelty lies in being the first compiler to automate bootstrapping. It bridges the gap left by previous compilers, enabling the practical application of FHE to complex, real-world programs that were previously intractable.

4. Methodology (Core Technology & Implementation)

DaCapo works in a two-stage pipeline to find the optimal bootstrapping plan.

该图像是一张示意图,展示了DaCapo自动靴磨机制的流程和不同靴磨插入点的代价估算。图中包含了输入程序、候选点选择、代价估算及最终靴磨插入的步骤,用箭头和数值标识了计算成本和流程。该图为理解自动靴磨管理策略提供直观支持。 该图像是一张示意图,展示了DaCapo自动靴磨机制的流程和不同靴磨插入点的代价估算。图中包含了输入程序、候选点选择、代价估算及最终靴磨插入的步骤,用箭头和数值标识了计算成本和流程。该图为理解自动靴磨管理策略提供直观支持。

As shown in the diagram above (corresponding to Figure 3 in the paper), the process involves:

  1. Candidate Selector: Identifies a set of promising program points where bootstrapping could be inserted.
  2. Bootstrapping Planner: Uses dynamic programming to select the best combination of points from this set to minimize total execution time.

6.1 Bootstrapping Candidate Selector

The goal here is to find locations that are efficient for bootstrapping. The intuition is to bootstrap at "bottlenecks" in the program's dataflow graph where the number of active ciphertexts is low. This is achieved through a three-step analysis detailed in Algorithm 1.

  • Liveness Analysis: DaCapo first performs a standard liveness analysis to determine, at every program point, which ciphertexts (variables) are "live-out"—meaning their values are needed for subsequent operations. Bootstrapping at a point with few live-outs is cheaper because fewer ciphertexts need to undergo the expensive bootstrapping operation.

  • Bypass Edge Analysis: This is a key refinement. A ciphertext might be "live" but not actually used for a long sequence of computations. For example, an output from an early layer of a neural network might be passed directly to a final addition layer, "bypassing" many intermediate layers. Such a "bypass edge" does not accumulate scale during the bypassed computations and likely does not need bootstrapping. The algorithm identifies these edges by checking if a live variable remains unused while other computations accumulate scale beyond a certain threshold (SthS_{th}). By excluding these bypass edges from the live-out count, DaCapo gets a more accurate measure of which ciphertexts genuinely require bootstrapping at a given point.

  • Candidate Filtering: The algorithm groups candidate points by their corrected live-out count (1, 2, 3, etc.). It starts by considering only the most efficient candidates (those with a live-out count of 1). It checks if a valid program (i.e., one without scale overflows) can be created using only these points. If not, it expands the candidate set by including points with a live-out count of 2, and so on, until a valid placement is possible. This ensures correctness while prioritizing the most efficient insertion points.

6.2 Cost-aware Bootstrapping Planner

Once a set of valid candidate points is selected, the planner's job is to choose the final placement plan. A naive approach might be to simply insert bootstraps as late as possible to minimize their count. However, DaCapo employs a more sophisticated, cost-aware strategy based on two key observations:

  1. Bootstrapping resets a ciphertext's level, which can make subsequent operations significantly faster (as shown in Table 2, lower-level operations are slower).

  2. Therefore, inserting more bootstraps can sometimes decrease total latency if the speedup in arithmetic operations outweighs the cost of the extra bootstraps.

    The planner uses a dynamic programming approach, described in Algorithm 2, to solve this optimization problem.

  • The Algorithm: The planner iterates through the candidate points (to) in program order. For each to, it calculates the minimum possible execution cost from the start of the program up to that point. To do this, it considers every possible previous bootstrapping point (from). It computes the cost of the program segment between from and to (assuming a bootstrap at from and another at to) and adds it to the already-computed minimum cost to reach from. It stores the plan (from -> to) that yields the lowest total cost for reaching to.

  • Cost Estimation: The cost of each segment is estimated by summing the profiled latencies of all FHE operations within it. The planner runs a simulated scale management pass for each potential segment to determine the level of each operation, using a pre-profiled look-up table (like Table 2) to get its latency.

    By repeating this process until the end of the program, the planner finds the sequence of bootstrapping points that results in the minimum overall estimated latency.

5. Experimental Setup

  • Datasets: The evaluation uses the CIFAR-10 image classification dataset.
  • Evaluation Metrics:
    • Latency / Speedup: The primary performance metric is the end-to-end inference time for a single image. Speedup is calculated as the ratio of the baseline's latency to the proposed method's latency.
    • Classification Accuracy: Used to verify that the FHE computations are correct and do not degrade the model's predictive power. The formula is: Accuracy=Number of Correctly Classified ImagesTotal Number of Images \text{Accuracy} = \frac{\text{Number of Correctly Classified Images}}{\text{Total Number of Images}}
  • Baselines:
    • Manual: A strong baseline representing the state-of-the-art, where an expert manually places bootstraps according to best practices from prior research [36].
    • Liveness: An ablation of DaCapo that uses only liveness analysis, without bypass edges or cost-awareness.
    • Bypass: An ablation that uses liveness and bypass edge analysis but greedily inserts bootstraps just before overflow, without cost-awareness.
    • DaCapo: The full proposed system with all optimizations.
  • Models and Implementation:
    • The experiments use a suite of deep learning models: ResNet-20/44, AlexNet, VGG16, SqueezeNet, and MobileNet.
    • Two variants of each model are tested: one with ReLU activation and max pooling (which requires deeper multiplicative circuits) and another with SiLU activation and average pooling (shallower).
    • The models are implemented using the GPU-accelerated HEaaN library on an NVIDIA GeForce RTX 3090 GPU.

6. Results & Analysis

Core Results

  • Correctness: The evaluation first confirms that DaCapo-compiled models maintain accuracy. The following table, transcribed from Table 3 of the paper, shows a negligible drop in accuracy compared to the unencrypted PyTorch versions.

    (Manual transcription of Table 3)

    Model ResNet-20 ResNet-44 AlexNet VGG16 SqueezeNet MobileNet
    ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU
    Pytorch 90.6% 92.6% 91.2% 92.7% 89.0% 86.6% 93.8% 93.0% 89.4% 88.0% 91.4% 91.4%
    DACAPO 90.7% 92.6% 91.3% 92.7% 89.0% 86.6% 93.8% 92.9% 89.2% 87.9% 91.2% 91.4%
  • Performance Speedup: DaCapo significantly outperforms the Manual baseline.

    该图像是一个条形图,展示了不同FHE编译器和优化策略在多个深度学习模型上的加速比对比。图中包括手动方式、Liveness、Bypass和DaCapo四种方案,DaCapo在大多数模型中实现了最高加速比,平均加速比达1.21倍。 该图像是一个条形图,展示了不同FHE编译器和优化策略在多个深度学习模型上的加速比对比。图中包括手动方式、Liveness、Bypass和DaCapo四种方案,DaCapo在大多数模型中实现了最高加速比,平均加速比达1.21倍。

    As shown in the chart above (Figure 4), DaCapo achieves an average speedup of 1.21×. The intermediate versions, Liveness and Bypass, show incremental improvements, validating the contribution of each component of the DaCapo framework.

  • Cost-Awareness vs. Bootstrap Count: The results demonstrate that minimizing the number of bootstraps is not always optimal.

    (Manual transcription of Table 4)

    Model ResNet-20 ResNet-44 AlexNet VGG16 SqueezeNet MobileNet
    ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU ReLU SiLU
    Manual 37 19 85 43 66 12 54 20 58 18 53 27
    DACAPO 37 19 85 43 67 12 53 20 57 19 61 27

    The table above shows the number of bootstraps used. For MobileNet(R), DaCapo uses more bootstraps (61) than the Manual baseline (53). However, as seen in the latency breakdown chart below (Figure 5), DaCapo is still faster overall.

    该图像是一个柱状图,展示了不同深度学习模型在多种FHE编译策略下的延迟时间对比,涵盖Bootstrap time、Manual、Liveness、Bypass和DaCapo五种方法,其结果表明DaCapo在大多数模型中实现了延迟下降。 该图像是一个柱状图,展示了不同深度学习模型在多种FHE编译策略下的延迟时间对比,涵盖Bootstrap time、Manual、Liveness、Bypass和DaCapo五种方法,其结果表明DaCapo在大多数模型中实现了延迟下降。

    This is because the extra bootstraps allowed the other arithmetic operations to run at lower, faster levels, and this performance gain more than compensated for the overhead of the additional bootstraps. This provides strong evidence for the effectiveness of DaCapo's cost-aware planning.

  • Accuracy of Performance Estimation: The cost model used by DaCapo's planner is shown to be highly accurate.

    该图像是多子图的折线图,展示了不同FHE编译器策略(Manual, Liveness, Bypass, DaCapo)在不同神经网络模型和激活函数(ReLU和SiLU)下,对延迟(秒)随Waterline变化的影响,直观比较了DaCapo方法的性能优势。 该图像是多子图的折线图,展示了不同FHE编译器策略(Manual, Liveness, Bypass, DaCapo)在不同神经网络模型和激活函数(ReLU和SiLU)下,对延迟(秒)随Waterline变化的影响,直观比较了DaCapo方法的性能优势。

    Figure 6 in the paper (shown above) plots the estimated latency against the actual measured latency. The strong linear correlation (R2=0.9986R^2 = 0.9986) confirms that the cost model is a reliable foundation for the optimization decisions made by the planner.

(Note: The provided text for the paper was incomplete, ending during Section 8.3. Therefore, analysis of compilation time (Section 8.4) and sensitivity to the waterline parameter (Section 8.6, Figure 7 in the paper) cannot be fully detailed, though the paper's introduction to the evaluation mentions these analyses were performed.)

7. Conclusion & Reflections

  • Conclusion Summary: DaCapo represents a landmark achievement in making Fully Homomorphic Encryption practical for complex applications. By creating the first automatic bootstrapping compiler, the authors have removed a significant and error-prone burden from FHE programmers. The novel combination of liveness-aware candidate selection and cost-aware latency planning allows DaCapo to generate FHE programs that are not only correct but also more performant than those tuned by human experts. This work significantly lowers the barrier to entry for developing large-scale, privacy-preserving services.

  • Limitations & Future Work: Based on the presented methodology, potential limitations and future directions could include:

    • Hardware Dependency: The cost model relies on latencies profiled on specific hardware (NVIDIA RTX 3090). Porting to different GPUs or CPUs would require re-profiling. A more abstract, portable cost model could be a direction for future work.
    • Heuristic Threshold: The S_th threshold used in bypass edge analysis is set heuristically. An adaptive or learned approach to setting this value might yield further performance gains.
    • Synchronous Bootstrapping: DaCapo bootstraps all non-bypass live-out ciphertexts at a given point simultaneously. While this simplifies level management, exploring asynchronous or partial bootstrapping could offer more fine-grained optimization opportunities in certain scenarios.
  • Personal Insights & Critique: This paper is a superb example of systems-level research that tackles a critical, practical problem in an emerging technology.

    • Impact: The primary impact of DaCapo is one of enablement. By automating the most difficult aspect of FHE programming, it opens the door for developers and researchers in other fields (like machine learning) to experiment with and deploy privacy-preserving solutions without needing to become cryptography experts.
    • Novelty: The true novelty lies in the holistic approach. While liveness analysis and dynamic programming are standard techniques, their application to the unique constraints of FHE (scale, level, and the extreme cost of bootstrapping) is highly innovative. The insight that more bootstraps can lead to better performance is counter-intuitive and a cornerstone of the work's success.
    • Overall: DaCapo is a significant step toward bridging the gap between the theoretical promise of FHE and its real-world application. It is a well-engineered and rigorously evaluated system that solves a clear and important problem.

Discussion

Leave a comment

Sign in to join the discussion.

No comments yet. Start the discussion!