Paper status: completed

Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption

Published:11/15/2023
Original Link
Price: 0.100000
4 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

The paper introduces Mult², which decomposes CKKS ciphertexts for homomorphic weak Euclidean division, halving modulus consumption and enabling multiple precision multiplication, deeper circuit evaluation, and reduced bootstrapping without increasing parameters.

Abstract

Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption Jung Hee Cheon CryptoLab Inc. Seoul National University Seoul, Republic of Korea jhcheon@snu.ac.kr Wonhee Cho Seoul National University Seoul, Republic of Korea wony0404@snu.ac.kr Jaehyung Kim CryptoLab Inc. Seoul, Republic of Korea jaehyungkim@cryptolab.co.kr Damien Stehlé CryptoLab Inc. Lyon, France damien.stehle@cryptolab.co.kr ABSTRACT Homomorphic Encryption (HE) schemes such as BGV, BFV, and CKKS consume some ciphertext modulus for each multiplication. Bootstrapping (BTS) restores the modulus and allows homomorphic computation to continue, but it is time-consuming and requires a significant amount of modulus. For these reasons, decreasing modulus consumption is crucial topic for BGV, BFV and CKKS, on which numerous studies have been conducted. We propose a novel method, called Mult 2 , to perform ciphertext multiplication in the CKKS scheme with lower modulus consump- tion. Mult 2 relies an a new decomposition of a ciphertext into a pair of ciphertexts that homomorphically performs a weak form of Euclidean division. It multiplies two ciphertexts in decomposed formats with hom

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption

1.2. Authors

  • Jung Hee Cheon (CryptoLab Inc., Seoul National University, Seoul, Republic of Korea)
  • Wonhee Cho (Seoul National University, Seoul, Republic of Korea)
  • Jaehyung Kim (CryptoLab Inc., Seoul, Republic of Korea)
  • Damien Stehlé (CryptoLab Inc., Lyon, France)

1.3. Journal/Conference

The paper is available on ePrint.iacr.org, which is a recognized preprint archive for cryptography research. Papers on ePrint are typically submitted to conferences or journals for peer review. The specific venue for its official publication is not mentioned in the provided text.

1.4. Publication Year

2023

1.5. Abstract

This paper introduces a novel method called Mult² for ciphertext multiplication within the CKKS homomorphic encryption scheme. The core innovation involves decomposing a ciphertext into a pair of ciphertexts, which then homomorphically execute a weak form of Euclidean division. This decomposition facilitates homomorphic double precision multiplication, yielding results approximately equivalent to standard CKKS multiplication but consuming almost half the ciphertext modulus. The method is extended to Multᵗ for any t2t \geq 2, allowing decomposition into tt components for multiple precision arithmetic (referred to as double-CKKS and tuple-CKKS). The proposed algorithms enable the evaluation of deeper computational circuits without increasing ciphertext dimension or the frequency of costly bootstrapping operations, or, conversely, reduce the number of bootstrapping operations for a given circuit depth. They also allow for increased precision without requiring larger cryptographic parameters. For instance, Mult² can handle 8 sequential multiplications with a 100-bit scaling factor using only a 680-bit ciphertext modulus, a feat unachievable with conventional CKKS multiplication. Experimental results and theoretical analysis support the efficiency and improved modulus consumption of these schemes.

https://eprint.iacr.org/2023/1788.pdf (Status: Preprint)

2. Executive Summary

2.1. Background & Motivation

The CKKS (Cheon-Kim-Kim-Song) homomorphic encryption scheme is widely used for computations on real and complex numbers, crucial for applications like machine learning. A fundamental challenge in CKKS and other homomorphic encryption schemes (BGV, BFV) is modulus consumption. Each homomorphic multiplication operation reduces the available ciphertext modulus. When the modulus becomes too small, further multiplications are impossible, and bootstrapping is required to refresh the modulus. Bootstrapping, however, is computationally expensive and itself consumes significant modulus. This scarcity of modulus is a major bottleneck, often forcing practitioners to use larger ciphertext dimensions or perform more frequent, costly bootstrapping operations. The core problem the paper addresses is this high modulus consumption in CKKS multiplication. Solving this is critical for enabling deeper computational circuits and increasing the practical applicability of FHE. The paper's innovative idea is to reduce modulus consumption by designing a "multiple precision" arithmetic for ciphertexts, inspired by how multi-precision arithmetic works for integers.

2.2. Main Contributions / Findings

The paper's primary contributions are:

  • Novel Mult² Algorithm: Introduction of Mult², a new homomorphic multiplication method for CKKS that significantly reduces modulus consumption (by almost half for Mult²) compared to standard CKKS multiplication. This is achieved by homomorphically decomposing a ciphertext into a pair and performing "double precision" multiplication, which discards low-significance terms to save modulus.
  • Homomorphic Euclidean Division: The development of a "weak form" of homomorphic Euclidean division that enables the decomposition of a ciphertext into quotient and remainder ciphertexts. This is a foundational building block for the multiple-precision approach.
  • Extension to Multᵗ (Tuple-CKKS): Generalization of Mult² to Multᵗ for any t2t \geq 2, allowing a ciphertext to be decomposed into tt components, facilitating multiple precision arithmetic (referred to as tuple-CKKS).
  • Increased Homomorphic Capacity: The proposed schemes enable the evaluation of deeper circuits without increasing ciphertext dimension or the frequency of bootstrapping, or, equivalently, reduce the number of bootstrapping operations for the same circuit depth. For example, Double-CKKS (Mult²) increased multiplicative depth from 13 to 18 for a given parameter set.
  • Increased Precision with Smaller Parameters: The methods allow for higher precision in computations without requiring an increase in cryptographic parameters (e.g., ring dimension). Mult² enabled 8 sequential multiplications with a 100-bit scaling factor using a 680-bit ciphertext modulus and N=215N=2^{15}, which was impossible with standard CKKS requiring N=216N=2^{16} and a 1000-bit modulus. This translates to substantial performance gains: 1.5x faster multiplication, 2.9x smaller ciphertexts, and 2.4x smaller switching keys.
  • Theoretical Analysis and Experimental Validation: The paper provides detailed theoretical analysis (e.g., error growth bounds, low part growth) and experimental results that support the efficiency and improved modulus consumption of the proposed schemes.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand this paper, a foundational understanding of homomorphic encryption, particularly the CKKS scheme, and related cryptographic and mathematical concepts is essential.

  • Homomorphic Encryption (HE): HE is a form of encryption that allows computations to be performed on encrypted data without decrypting it first. The results of these computations are encrypted and, when decrypted, match the results of operations performed on the unencrypted data. This enables privacy-preserving data processing.
    • Somewhat Homomorphic Encryption (SHE): SHE schemes support a limited number of homomorphic operations (e.g., additions and a few multiplications) before noise accumulation makes further computation impossible.
    • Fully Homomorphic Encryption (FHE): FHE schemes are SHE schemes augmented with a bootstrapping mechanism. Bootstrapping is a technique that refreshes the ciphertext, reducing its noise level and allowing an unlimited number of operations, thus achieving "full" homomorphic capabilities.
  • CKKS Scheme: The Cheon-Kim-Kim-Song (CKKS) scheme [12] is a SHE scheme designed for approximate arithmetic on real or complex numbers. Unlike integer-based HE schemes (BGV, BFV), CKKS inherently introduces some noise, making it suitable for applications that can tolerate small errors, such as machine learning.
    • Modulus Consumption: In CKKS, the ciphertext is represented modulo a large integer QQ. Each multiplication operation reduces this effective modulus, meaning the ciphertext implicitly operates modulo a smaller QQ. This reduction is referred to as modulus consumption. Once the modulus becomes too small, the noise in the ciphertext can no longer be managed, and decryption becomes unreliable.
    • Scaling Factor (Δ\Delta): CKKS represents real numbers as fixed-point integers. Messages (real numbers) are multiplied by a large scaling factor Δ\Delta and then rounded to integers before encryption. This scaling factor determines the precision of the fixed-point representation. During homomorphic multiplication, the scaling factor typically squares, so a rescaling operation is needed to bring it back to the original Δ\Delta, maintaining manageable plaintext sizes and precision.
    • Bootstrapping: As explained for FHE, bootstrapping in CKKS is a procedure to refresh a noisy ciphertext and increase its modulus, allowing further homomorphic operations. It involves homomorphically evaluating a polynomial approximation of the modular reduction function. It is computationally intensive and consumes a significant amount of modulus itself.
  • Ring Learning With Errors (RLWE): CKKS is based on the Ring Learning With Errors (RLWE) problem, a hard mathematical problem believed to be resistant to quantum attacks. In RLWE-based schemes, computations happen over polynomial rings.
    • Ring (RR): The underlying algebraic structure is typically a polynomial ring R=Z[X]/(XN+1)R = \mathbb{Z}[X]/(X^N+1), where NN is a power of two. Elements in this ring are polynomials of degree less than NN.
    • Canonical Embedding: The canonical embedding (can) maps polynomials in RR to vectors of complex numbers. Specifically, for m(X)Rm(X) \in R, can maps m(X) to a vector of its evaluations at specific roots of unity. This mapping is crucial for understanding the norms and error growth in CKKS, as it relates polynomial coefficients to the actual complex numbers being approximated. The infinity norm of can(m) is often a tighter bound for errors than the infinity norm of mm's coefficients.
  • Residue Number System (RNS): RNS is a number representation system that allows large integer arithmetic to be performed in parallel over smaller moduli. In CKKS, the large modulus QQ is typically a product of many smaller primes qiq_i (e.g., Q=qiQ = \prod q_i). Operations modulo QQ are then performed component-wise modulo each qiq_i, which is much more efficient.
  • CKKS Operations (Tensor, Relin, RS):
    • Tensor (Homomorphic Multiplication - before relinearization): When two ciphertexts ct1\mathsf{ct}_1 and ct2\mathsf{ct}_2 are multiplied, their components are multiplied element-wise, resulting in a ciphertext with more components (e.g., from 2 to 3 elements in RQR_Q). This increases the ciphertext dimension and prepares it for relinearization.
    • Relinearization (Relin): After Tensor, the ciphertext's dimension increases. Relinearization is a key-switching procedure that reduces the ciphertext's dimension back to the original (e.g., from 3 elements to 2 elements), making it compatible for further operations. This process uses a relinearization key (rlk) and involves a modulus PP.
    • Rescaling (RS): After relinearization, the plaintext values within the ciphertext are effectively scaled up (e.g., by Δ2\Delta^2). Rescaling divides the ciphertext by a chosen modulus qq_\ell (a prime factor of the current total modulus QQ_\ell) to bring the plaintext back to a manageable scale (Δ\Delta) and reduces the ciphertext's modulus. This step consumes a prime from the modulus chain.

3.2. Previous Works

The paper contextualizes its contribution by referencing several prior works focused on modulus consumption and CKKS improvements.

  • Gadget Decomposition [3, 5, 6, 13, 20, 21]: These techniques aim to optimize key switching operations (used in relinearization and bootstrapping) by decomposing secret keys or ciphertexts into smaller "gadget" components.
    • Bit decomposition [5, 6]: Decomposes values into individual bits.
    • Digit decomposition [13]: Decomposes values into larger base digits.
    • RNS-based decomposition [3, 20, 21]: Leverages RNS properties for decomposition.
    • Limitation: While effective for key switching, gadget decomposition typically slows down key switching by a factor equal to the gadget rank (dnum) and increases key size by the same factor. The paper highlights that libraries like SEAL use large dnum values, leading to very large switching key sizes (e.g., 142.6MB for N=215N=2^{15}). The current paper's MulttMult^t approach focuses on modulus consumption in the multiplication itself rather than just key switching.
  • Modulus Consumption in Specific Operations [2, 7, 8, 21, 24, 31-33]: Other works have focused on reducing modulus consumption in specific, computationally intensive operations like linear transformations and bootstrapping.
    • Limitation: These studies typically address large-scale computations like bootstrapping but do not directly improve the homomorphic multiplication operation itself, which is the focus of the current paper.
  • Reducing Ring Dimension (Module-LWE) [6, 39]: An alternative way to save resources is to reduce the ring dimension NN. Module-LWE schemes, like those proposed in [6] and [39] for CKKS, use ciphertexts based on modules rather than rings, which can allow for smaller NN while maintaining security.
    • Comparison with Module-CKKS (e.g., Rank-2): The paper compares Mult² with rank-2 module-CKKS. Module-CKKS keeps the same modulus as CKKS but uses 3 ring elements per ciphertext (vs. 2 for standard CKKS). Mult² halves the maximum modulus bit-size and uses 4 ring elements (a pair of 2-element ciphertexts). The paper argues Mult² results in 33% smaller ciphertexts and lower computational cost (2 relinearizations vs. 3 for module-CKKS). This advantage grows further for tuple-CKKS (Multᵗ).

3.3. Technological Evolution

The field of Homomorphic Encryption has seen dramatic improvements since Gentry's foundational work in 2009 [17].

  • First Generation (Gentry's Scheme): Pioneering but impractical due to high computational costs.
  • Second Generation (BGV, BFV, CKKS): Significant performance improvements, making SHE practical for certain tasks.
    • BGV [6] and BFV [3] schemes emerged for integer arithmetic.
    • CKKS [12] specifically addressed approximate real/complex number arithmetic, opening doors for applications in machine learning and data science.
  • Efficiency Improvements: Subsequent research focused on optimizing various aspects:
    • RNS techniques [11, 18, 20, 21] for faster modular arithmetic.
    • Improved bootstrapping algorithms [8, 10, 21, 31-33] to extend SHE to FHE more efficiently.
    • Key switching optimizations (e.g., gadget decomposition [3, 5, 6, 20, 21]).
  • Current Research: The focus is on making FHE schemes more practical by reducing computational overhead, parameter sizes, and managing noise and modulus consumption more effectively. This paper fits into this lineage by directly tackling the modulus consumption problem during the core multiplication operation itself, an area where prior work mainly focused on key switching or bootstrapping.

3.4. Differentiation Analysis

The core difference and innovation of this paper's MulttMult^t approach, especially Mult², compared to existing methods lie in its direct attack on modulus consumption during the homomorphic multiplication step itself.

  • Focus on Multiplication, not just Key Switching or Bootstrapping: Most prior works on modulus saving concentrated on optimizing key switching (e.g., gadget decomposition) or bootstrapping (e.g., improved EvalMod, sine series approximation). While these are important, they don't fundamentally alter how modulus is consumed during the tensor product and rescaling phases of a multiplication. MulttMult^t introduces a new paradigm for the multiplication operation itself.
  • Homomorphic Euclidean Division as a Core Primitive: The concept of homomorphically decomposing a ciphertext into quotient and remainder parts is novel and forms the basis of the multiple precision arithmetic. This allows treating parts of the plaintext (high-order and low-order bits) separately and discarding less significant terms.
  • Reduced Modulus Consumption per Multiplication: By selectively dropping the product of low parts (ctˇ1ctˇ2\check{\mathrm{ct}}_1 \otimes \check{\mathrm{ct}}_2) during the Tensor step of Mult², the algorithm effectively consumes modulus at a rate equivalent to multiplying with smaller bit-size multiplicands. This is the key mechanism for "almost halving" the modulus consumption for Mult² (from Δ\Delta to Δ\sqrt{\Delta} effective scale consumption).
  • Double/Multiple Precision Arithmetic: The paper's approach directly enables double-CKKS and tuple-CKKS, which are analogous to double precision or multiple precision floating-point arithmetic in traditional computing. This allows managing precision and error growth more granularly, something not directly addressed by gadget decomposition or bootstrapping improvements.
  • Parameter Efficiency: The modulus consumption reduction directly translates to smaller maximal moduli (QLPQ_L P) and, consequently, smaller ring dimensions (NN) needed to achieve a given security level and multiplicative depth. This leads to substantial gains in latency, ciphertext size, and switching key size, as demonstrated in the experimental results. This is a more holistic efficiency gain compared to solutions that only reduce one aspect (e.g., key size for gadget decomposition often comes with a slowdown).

4. Methodology

4.1. Principles

The core idea behind MulttMult^t is to adapt the concept of multiple-precision arithmetic from classical integer or fixed-point computation to the homomorphic encryption setting. In traditional arithmetic, to multiply two large numbers, each can be decomposed into smaller bit-size pieces (e.g., high-order and low-order parts). The multiplication is then performed on these pieces, and the results are combined. For fixed-point arithmetic, less significant terms might be discarded if they fall below the desired precision threshold.

Specifically, for two 2k-bit integers m1m_1 and m2m_2, each can be decomposed into two kk-bit pieces: mi=2km^i+mˇim_i = 2^k \cdot \hat{m}_i + \check{m}_i. Their product is: m1m2=22km^1m^2+2k(m^1mˇ2+mˇ1m^2)+mˇ1mˇ2m_1 m_2 = 2^{2k} \hat{m}_1 \hat{m}_2 + 2^k (\hat{m}_1 \check{m}_2 + \check{m}_1 \hat{m}_2) + \check{m}_1 \check{m}_2. If only a precision of 22k\approx 2^{-2k} is needed, the last term mˇ1mˇ2\check{m}_1 \check{m}_2 can be dropped. This reduces a 2k-bit multiplication to four kk-bit multiplications, but effectively only two significant terms need to be combined, keeping the number of "pieces" constant after multiplication.

The challenge in homomorphic encryption is twofold:

  1. Homomorphic Decomposition: How to homomorphically decompose a ciphertext (whose underlying plaintext is a large number) into smaller pieces (ciphertexts representing m^\hat{m} and mˇ\check{m}). Directly encrypting pre-decomposed parts doesn't work due to noise accumulation.

  2. Managing Increasing Pieces: How to perform multiplication on these decomposed ciphertexts without the number of pieces continually growing, and while correctly managing noise and modulus consumption.

    The paper addresses these by introducing homomorphic Euclidean division to decompose ciphertexts and by carefully redesigning the Tensor, Relinearization, and Rescaling operations for these decomposed ciphertexts to control error growth and modulus consumption.

4.2. Core Methodology In-depth (Layer by Layer)

4.2.1. Homomorphic Euclidean Division

The concept of Euclidean division is extended from integers to polynomials (which are the form of plaintexts in CKKS). For a polynomial mRm \in R and an integer qq, its Euclidean division gives a quotient and a remainder: m=m[m]qqq+[m]qm = \frac{m - [m]_q}{q} \cdot q + [m]_q where m[m]qq\frac{m - [m]_q}{q} is the quotient and [m]_q is the remainder.

The paper defines Euclidean division directly on ciphertexts by applying it component-wise.

Definition 3.1 (Ciphertext Euclidean Division). Let qQq|Q be two integers. Let ct=(b,a)RQ2\mathsf{ct} = (b, a) \in R_Q^2 be a ciphertext. The remainder of ct\mathsf{ct} by qq is defined as Remq(ct)=([b]q,[a]q)R2. \mathsf{Rem}_q(\mathsf{ct}) = ([b]_q, [a]_q) \in R^2. The quotient of ct\mathsf{ct} by qq is defined as Quoq(ct)=RSq(ct)=ctRemq(ct)qRQ/q2. \mathsf{Quo}_q(\mathsf{ct}) = \mathsf{RS}_q(\mathsf{ct}) = \frac{\mathsf{ct} - \mathsf{Rem}_q(\mathsf{ct})}{q} \in R_{Q/q}^2. Here:

  • ct=(b,a)\mathsf{ct} = (b, a) is a CKKS ciphertext, where b, a are polynomials in RQR_Q.
  • qq is an integer divisor.
  • [b]_q and [a]_q denote the coefficient-wise modular reduction of polynomials bb and aa modulo qq. The result is taken as coefficients in (q/2,q/2](-q/2, q/2].
  • Remq(ct)\mathsf{Rem}_q(\mathsf{ct}) thus produces a pair of polynomials (regarded as a ciphertext in R2R^2) with small coefficients (bounded by q/2q/2).
  • Quoq(ct)\mathsf{Quo}_q(\mathsf{ct}) involves subtracting the remainder and then dividing by qq. This division is rescaling by qq, hence the notation RSq(ct)\mathsf{RS}_q(\mathsf{ct}). The result is a ciphertext modulo Q/q. The subtraction ctRemq(ct)ct - Rem_q(ct) is performed modulo QQ, and then the division by qq reduces the modulus from QQ to Q/q.

Theorem 3.2. Let qQq|Q be two integers. Let ctRQ2\mathsf{ct} \in R_Q^2 and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Let m=[ctsk]QRm = [\mathsf{ct} \cdot \mathsf{sk}]_Q \in R and write m=m^q+mˇm = \hat{m} \cdot q + \check{m} with mˇ=[m]qR\check{m} = [m]_q \in R and m^=(m[m]q)/qR\hat{m} = (m - [m]_q)/q \in R. We have: Quoq(ct)sk=m^+I(modQ/q),Remq(ct)sk=mˇqI, \begin{array}{ll} \mathsf{Quo}_q(\mathsf{ct}) \cdot \mathsf{sk} &= \hat{m} + I \pmod{Q/q}, \\ \mathsf{Rem}_q(\mathsf{ct}) \cdot \mathsf{sk} &= \check{m} - qI, \end{array} for some IRI \in R satisfying I(h+2)/2\|I\|_\infty \leq (h+2)/2. Here:

  • mm is the underlying plaintext polynomial of ct\mathsf{ct}.
  • m^\hat{m} is the ideal quotient polynomial, mˇ\check{m} is the ideal remainder polynomial.
  • The theorem states that Quoq(ct)Quo_q(ct) decrypts to approximately m^\hat{m}, and Remq(ct)Rem_q(ct) decrypts to approximately mˇ\check{m}.
  • The term II represents a small error polynomial. The infinity norm I\|I\|_\infty bounds the maximum absolute value of its coefficients. Its small size ensures that the homomorphic Euclidean division is accurate. This error arises from the rescaling operation RSqRS_q and the inherent noise in CKKS.

4.2.2. Pair Representation (Double-CKKS)

The Pair Representation (Double-CKKS) is a novel way to represent a ciphertext as a pair of ciphertexts, analogous to decomposing a number into high and low parts.

Definition 3.3. Let QQ_\ell be an element of the modulus chain (see Section 2.1). Let qdiv2q_{\mathrm{div}} \geq 2 be an integer. For a ciphertext ctRQ2\mathsf{ct} \in R_{Q_\ell'}^2 where Q=QqdivQ_\ell' = Q_\ell \cdot q_{\mathrm{div}}, the decomposition of ct\mathsf{ct} is: DCPqdiv(ct)=(Quoqdiv(ct),Remqdiv(ct))RQ2×RQ2. \mathsf{DCP}_{q_{\mathrm{div}}}(\mathsf{ct}) = \left(\mathsf{Quo}_{q_{\mathrm{div}}}(\mathsf{ct}), \mathsf{Rem}_{q_{\mathrm{div}}}(\mathsf{ct})\right) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2. Conversely, the recombination of (ct^,ctˇ)RQ2×RQ2(\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 is: RCBqdiv(ct^,ctˇ)=qdivct^+ctˇRQ2. \mathsf{RCB}_{q_{\mathrm{div}}}(\hat{\mathsf{ct}}, \check{\mathsf{ct}}) = q_{\mathrm{div}} \cdot \hat{\mathsf{ct}} + \check{\mathsf{ct}} \in R_{Q_\ell}^2. Here:

  • QQ_\ell is a modulus from the CKKS modulus chain.
  • qdivq_{\mathrm{div}} is the integer divisor used for decomposition.
  • ct\mathsf{ct} is the original ciphertext, defined modulo QqdivQ_\ell \cdot q_{\mathrm{div}}.
  • DCP (Decompose Ciphertext Pair) takes a ciphertext ct\mathsf{ct} and produces a pair (ct^,ctˇ)(\hat{\mathsf{ct}}, \check{\mathsf{ct}}). ct^\hat{\mathsf{ct}} is the "high part" (quotient ciphertext), and ctˇ\check{\mathsf{ct}} is the "low part" (remainder ciphertext). Note that ct^\hat{\mathsf{ct}} is modulo QQ_\ell, while ctˇ\check{\mathsf{ct}} is initially R2R^2 but can be viewed as RQ2R_{Q_\ell}^2 if Q>qdivQ_\ell > q_{\mathrm{div}}.
  • RCB (Recombine Ciphertext Pair) takes a pair of ciphertexts (ct^,ctˇ)(\hat{\mathsf{ct}}, \check{\mathsf{ct}}) and combines them back into a single ciphertext using multiplication by qdivq_{\mathrm{div}} and addition.

Lemma 3.4. For a pair of ciphertexts (ct^,ctˇ)RQ2×RQ2(\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 and a secret key skR2\mathsf{sk} \in R^2, we have, modulo QQ_\ell: RCBqdiv(ct^,ctˇ)sk=(ct^sk)qdiv+(ctˇsk). \mathsf{RCB}_{q_{\mathrm{div}}}(\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \cdot \mathsf{sk} = (\hat{\mathsf{ct}} \cdot \mathsf{sk}) \cdot q_{\mathrm{div}} + (\check{\mathsf{ct}} \cdot \mathsf{sk}). This lemma states that RCB performs the reverse operation of DCP on the underlying plaintexts, essentially reconstructing the combined plaintext from its high and low parts.

4.2.3. Tuple Representation (Tuple-CKKS)

The Pair Representation is extended to Tuple Representation for tt components. For an integer e1e \geq 1 (which corresponds to the tuple length tt), a ciphertext ctRQ2\mathsf{ct} \in R_{Q_\ell'}^2 where Q=Qqdive1Q_\ell' = Q_\ell \cdot q_{\mathrm{div}}^{e-1} can be decomposed into an ee-tuple.

The decomposition is performed recursively:

  1. (ct^0,ctˇ0)=DCPqdive1(ct)(\hat{\mathsf{ct}}_0, \check{\mathsf{ct}}_0) = \mathsf{DCP}_{q_{\mathrm{div}}^{e-1}}(\mathsf{ct})

  2. For 1ie21 \leq i \leq e-2: (ct^i,ctˇi)=DCPqdivei1(ctˇi1)(\hat{\mathsf{ct}}_i, \check{\mathsf{ct}}_i) = \mathsf{DCP}_{q_{\mathrm{div}}^{e-i-1}}(\check{\mathsf{ct}}_{i-1})

  3. Finally, ct^e1=ctˇe2\hat{\mathsf{ct}}_{e-1} = \check{\mathsf{ct}}_{e-2}.

    The decomposition of ct\mathsf{ct} is then DCPqdive(ct)=(ct^0,,ct^e2,ct^e1)(RQ2)e\mathsf{DCP}_{q_{\mathrm{div}}}^e(\mathsf{ct}) = (\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{e-2}, \hat{\mathsf{ct}}_{e-1}) \in (R_{Q_\ell}^2)^e. Each component ct^j\hat{\mathsf{ct}}_j is a ciphertext modulo QQ_\ell.

Conversely, the recombination of (ct^0,,ct^e1)(RQ2)e(\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{e-1}) \in (R_{Q_\ell}^2)^e is: RCBqdive(ct^0,,ct^e2,ct^e1)=ct^0qdive1++ct^e2qdiv+ct^e1RQ2. \mathsf{RCB}_{q_{\mathrm{div}}}^e(\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{e-2}, \hat{\mathsf{ct}}_{e-1}) = \hat{\mathsf{ct}}_0 \cdot q_{\mathrm{div}}^{e-1} + \ldots + \hat{\mathsf{ct}}_{e-2} \cdot q_{\mathrm{div}} + \hat{\mathsf{ct}}_{e-1} \in R_{Q_\ell}^2. This generalizes the base-qdivq_{\mathrm{div}} representation for polynomials to ciphertexts, effectively creating a t-tuple of ciphertexts, where tt is the tuple length.

4.2.4. Homomorphic Double-Precision Multiplication (Mult²)

The Mult² algorithm consists of three adapted operations: Tensor², Relin², and RS².

Definition 4.1 (Pair Tensor). Let CT1=(ct^1,ctˇ1)\mathsf{CT}_1 = (\hat{\mathsf{ct}}_1, \check{\mathsf{ct}}_1), CT2=(ct^2,ctˇ2)RQ2×RQ2\mathsf{CT}_2 = (\hat{\mathsf{ct}}_2, \check{\mathsf{ct}}_2) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 be ciphertext pairs. The tensor of CT1\mathsf{CT}_1 and CT2\mathsf{CT}_2 is defined as: CT12CT2=(ct^1ct^2,ct^1ctˇ2+ctˇ1ct^2)RQ3×RQ3. \mathsf{CT}_1 \otimes^2 \mathsf{CT}_2 = \left(\hat{\mathsf{ct}}_1 \otimes \hat{\mathsf{ct}}_2, \hat{\mathsf{ct}}_1 \otimes \check{\mathsf{ct}}_2 + \check{\mathsf{ct}}_1 \otimes \hat{\mathsf{ct}}_2\right) \in R_{Q_\ell}^3 \times R_{Q_\ell}^3. Here:

  • CT1\mathsf{CT}_1 and CT2\mathsf{CT}_2 are input ciphertext pairs.
  • \otimes denotes the standard CKKS Tensor operation.
  • The output is a pair of standard CKKS ciphertexts, each of dimension 3 after the Tensor operation.
  • Crucially, this operation discards the product of the low-part ciphertexts, ctˇ1ctˇ2\check{\mathsf{ct}}_1 \otimes \check{\mathsf{ct}}_2. This is the main source of modulus consumption reduction, as it ignores the least significant terms, introducing a controlled numerical error.

Lemma 4.2. Let CTi=(ct^i,ctˇi)RQ2×RQ2\mathsf{CT}_i = (\hat{\mathsf{ct}}_i, \check{\mathsf{ct}}_i) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 be a ciphertext pair and RCBqdiv(CTi)=cti\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}_i) = \mathsf{ct}_i for i{1,2}i \in \{1, 2\}. Let sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key. Then, modulo QQ_\ell: (ct1ct2)(1,s,s2)=qdiv(RCBqdiv(CT12CT2))(1,s,s2)+(ctˇ1sk)(ctˇ2sk). (\mathsf{ct}_1 \otimes \mathsf{ct}_2) \cdot (1, s, s^2) = q_{\mathrm{div}} \cdot (\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}_1 \otimes^2 \mathsf{CT}_2)) \cdot (1, s, s^2) + (\check{\mathsf{ct}}_1 \cdot \mathsf{sk}) \cdot (\check{\mathsf{ct}}_2 \cdot \mathsf{sk}). Now, assume that Dec(ct^i)M^\|\mathsf{Dec}(\hat{\mathsf{ct}}_i)\|_\infty \leq \hat{M} and Dec(ctˇi)Mˇ\|\mathsf{Dec}(\check{\mathsf{ct}}_i)\|_\infty \leq \check{M} for all i{1,2}i \in \{1, 2\} and for some M^,Mˇ\hat{M}, \check{M} satisfying N(M^qdiv+Mˇ)2<Q/2N(\hat{M} q_{\mathrm{div}} + \check{M})^2 < Q_\ell/2. Then we have: [(RCBqdiv(CT12CT2))(1,s,s2)]Q1qdiv[(ct1ct2)(1,s,s2)]QNMˇ2qdiv. \Big\| \Big[ (\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}_1 \otimes^2 \mathsf{CT}_2)) \cdot (1, s, s^2) \Big]_{Q_\ell} - \frac{1}{q_{\mathrm{div}}} \Big[ (\mathsf{ct}_1 \otimes \mathsf{ct}_2) \cdot (1, s, s^2) \Big]_{Q_\ell} \Big\|_\infty \leq \frac{N \check{M}^2}{q_{\mathrm{div}}}. Here:

  • The first equation relates the decryption of the standard Tensor product of recombined ciphertexts to the Tensor² product of the pairs. It shows that Tensor² effectively computes the higher-order terms of the product, while the term (ctˇ1sk)(ctˇ2sk)(\check{\mathsf{ct}}_1 \cdot \mathsf{sk}) \cdot (\check{\mathsf{ct}}_2 \cdot \mathsf{sk}) (the product of the low parts) is discarded.
  • The second inequality bounds the error introduced by this discarding. It shows that Tensor² applied to CT1 and CT2 decrypts to approximately the same plaintext as Tensor of ct1 and ct2 divided by qdivq_{\mathrm{div}}. The error bound NMˇ2/qdivN \check{M}^2 / q_{\mathrm{div}} highlights the impact of dropping the ctˇ1ctˇ2\check{\mathsf{ct}}_1 \otimes \check{\mathsf{ct}}_2 term, which depends quadratically on the size of the low parts Mˇ\check{M}.

Definition 4.3 (Pair Relinearize). Let CT=(ct^,ctˇ)RQ3×RQ3\mathsf{CT} = (\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \in R_{Q_\ell}^3 \times R_{Q_\ell}^3 be a ciphertext pair (output of Tensor²). The relinearization of CT\mathsf{CT} is defined as: Relin2(CT)=DCPqdiv(Relin(qdivct^))+(0,Relin(ctˇ)). \mathsf{Relin}^2(\mathsf{CT}) = \mathsf{DCP}_{q_{\mathrm{div}}}(\mathsf{Relin}(q_{\mathrm{div}} \cdot \hat{\mathsf{ct}})) + (0, \mathsf{Relin}(\check{\mathsf{ct}})). Here:

  • Relin\mathsf{Relin} is the standard CKKS relinearization operation.
  • Instead of simply relinearizing each component independently, which would cause significant error, Relin² first multiplies the high-part ct^\hat{\mathsf{ct}} by qdivq_{\mathrm{div}}, relinearizes it, and then decomposes the result (DCP). This "raises the modulus" for the high part before relinearization. The low part ctˇ\check{\mathsf{ct}} is relinearized directly.
  • This approach avoids modulus consumption in the Relin step of Mult² because DCP is applied to a ciphertext already modulated by qdivq_{\mathrm{div}}, and the resulting parts are modulo QQ_\ell.

Lemma 4.4. Let CT=(ct^,ctˇ)RQ3×RQ3\mathsf{CT} = (\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \in R_{Q_\ell}^3 \times R_{Q_\ell}^3 be an output of Tensor² and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 a secret key with ss of Hamming weight hh. Then the quantity [(RCBqdiv(Relin2(CT)))(1,s)(RCBqdiv(CT))(1,s,s2)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{Relin}^2(\mathsf{CT}))\right) \cdot (1, s) - \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT})\right) \cdot (1, s, s^2) \right]_{Q_\ell} has infinity norm ιERelin+h\iota \leq E_{\mathrm{Relin}} + h. Now, assume that [RCBqdiv(CT)(1,s,s2)]QM\|\left[\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}) \cdot (1, s, s^2)\right]_{Q_\ell}\|_\infty \leq M for some MM satisfying 2(M+ERelin+h)<Q/22(M + E_{\mathrm{Relin}} + h) < Q_\ell/2. Then the quantity [(RCBqdiv(Relin2(CT)))(1,s)]Q[(RCBqdiv(CT))(1,s,s2)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{Relin}^2(\mathsf{CT}))\right) \cdot (1, s) \right]_{Q_\ell} - \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT})\right) \cdot (1, s, s^2) \right]_{Q_\ell} also has infinity norm ιERelin+h\iota \leq E_{\mathrm{Relin}} + h. Here:

  • ERelinE_{\mathrm{Relin}} is the error bound from standard CKKS relinearization.
  • This lemma shows that Relin² correctly transforms a ciphertext that decrypts under an extended key (1,s,s2)(1, s, s^2) to one that decrypts under the standard secret key (1,s)(1, s), with a bounded error. The error bound is similar to standard CKKS relinearization, indicating that Relin² does not introduce significant additional error.

Definition 4.5 (Pair Rescale). Let CT=(ct^,ctˇ)RQ2×RQ2\mathsf{CT} = (\hat{\mathsf{ct}}, \check{\mathsf{ct}}) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 be a ciphertext pair. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1}. The rescale of CT\mathsf{CT} is defined as: RSq2(CT)=(RSq(ct^),RSq(qdivct^+ctˇ)qdivRSq(ct^)). \mathsf{RS}_{q_\ell}^2(\mathsf{CT}) = \left(\mathsf{RS}_{q_\ell}(\hat{\mathsf{ct}}), \mathsf{RS}_{q_\ell}(q_{\mathrm{div}} \cdot \hat{\mathsf{ct}} + \check{\mathsf{ct}}) - q_{\mathrm{div}} \cdot \mathsf{RS}_{q_\ell}(\hat{\mathsf{ct}})\right). It belongs to RQ12×RQ12R_{Q_{\ell-1}}^2 \times R_{Q_{\ell-1}}^2. Here:

  • RSq\mathsf{RS}_{q_\ell} is the standard CKKS rescale operation.
  • RS² also intelligently uses rescaling to maintain the pair representation. The key idea is that the formula ensures: RCBqdiv(RSq2(CT))=RSq(RCBqdiv(CT)). \mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{RS}_{q_\ell}^2(\mathsf{CT})) = \mathsf{RS}_{q_\ell}(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT})). This means RS² correctly rescales the recombined plaintext, but it does so without consuming more modulus than a standard RS (only qq_\ell is consumed, not an additional qdivq_{\mathrm{div}}).

Lemma 4.6. Let CT(RQ2)2\mathsf{CT} \in (R_{Q_\ell}^2)^2 be a ciphertext pair. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1}. Let sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Then the quantity [(RCBqdiv(RSq2(CT)))(1,s)]Q11q[(RCBqdiv(CT))(1,s)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{RS}_{q_\ell}^2(\mathsf{CT}))\right) \cdot (1, s) \right]_{Q_{\ell-1}} - \frac{1}{q_\ell} \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT})\right) \cdot (1, s) \right]_{Q_\ell} has infinity norm (h+1)/2\leq (h+1)/2. This lemma confirms the correctness of RS², showing that it rescales the underlying plaintext by 1/q1/q_\ell with only a small, bounded error, similar to standard CKKS rescaling.

Definition 4.7 (Pair Multiply - Mult²). We define multiplication of ciphertext pairs over QQ_\ell as Mult2:=RSq2Relin2Tensor2\mathsf{Mult}^2 := \mathsf{RS}_{q_\ell}^2 \circ \mathsf{Relin}^2 \circ \mathsf{Tensor}^2, where q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1}. The result is a ciphertext pair over modulus Q1Q_{\ell-1}. This is the complete definition of the Mult² operation, combining the three adapted steps.

Theorem 4.8. Let CT1=(ct^1,ctˇ1)\mathsf{CT}_1 = (\hat{\mathsf{ct}}_1, \check{\mathsf{ct}}_1), CT2=(ct^2,ctˇ2)RQ2×RQ2\mathsf{CT}_2 = (\hat{\mathsf{ct}}_2, \check{\mathsf{ct}}_2) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 be ciphertext pairs. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1} and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Assume that Dec(ct^i)M^\|\mathsf{Dec}(\hat{\mathsf{ct}}_i)\|_\infty \leq \hat{M} and Dec(ctˇi)Mˇ\|\mathsf{Dec}(\check{\mathsf{ct}}_i)\|_\infty \leq \check{M} for all i{1,2}i \in \{1, 2\} and for some M^,Mˇ\hat{M}, \check{M} satisfying N(M^qdiv+Mˇ)2+ERelin+h<Q/2N(\hat{M} q_{\mathrm{div}} + \check{M})^2 + E_{\mathrm{Relin}} + h < Q_\ell/2. Then [(RCBqdiv(Mult2(CT1,CT2)))sk]Q11q[(RCBqdiv(CT1)sk)(RCBqdiv(CT2)sk)]Q \left[ \Big(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{Mult}^2(\mathsf{CT}_1, \mathsf{CT}_2))\Big) \cdot \mathsf{sk} \right]_{Q_{\ell-1}} - \frac{1}{q_\ell} \left. \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}_1) \cdot \mathsf{sk}\right) \cdot \left(\mathsf{RCB}_{q_{\mathrm{div}}}(\mathsf{CT}_2) \cdot \mathsf{sk}\right) \right]_{Q_\ell} \right. has infinity norm (NMˇ2qdiv+ERelin+h)/q+(h+1)/2. \leq \left(\frac{N \check{M}^2}{q_{\mathrm{div}}} + E_{\mathrm{Relin}} + h \right) / q_\ell + (h+1)/2. Here:

  • This theorem provides the overall correctness and error bound for Mult².
  • The error term NMˇ2qdivq\frac{N \check{M}^2}{q_{\mathrm{div}} q_\ell} (from Tensor2\mathsf{Tensor}^2) is the main new component compared to standard CKKS multiplication, arising from discarding the product of low parts.
  • The modulus consumption per Mult² operation is qq_\ell, which is chosen to be smaller than the scaling factor Δ\Delta in standard CKKS (where Δq2\Delta \approx q_\ell^2). This means Mult² consumes roughly half the modulus per multiplication for the same effective scaling factor.

4.2.5. Bounding the Low Parts (for Mult²)

The error introduced by Tensor² depends quadratically on the norm of the low part plaintext, Mˇ\check{M}. Thus, understanding and bounding the growth of Mˇ\check{M} is critical for maintaining precision over sequential multiplications. The paper uses the canonical embedding for tighter bounds.

Theorem 4.9 (Growth of the Low Part). Let CT1=(ct^1,ctˇ1)\mathsf{CT}_1 = (\hat{\mathsf{ct}}_1, \check{\mathsf{ct}}_1) and CT2=(ct^2,ctˇ2)RQ2×RQ2\mathsf{CT}_2 = (\hat{\mathsf{ct}}_2, \check{\mathsf{ct}}_2) \in R_{Q_\ell}^2 \times R_{Q_\ell}^2 be ciphertext pairs. Let q=Q/Q12q_\ell = Q_\ell / Q_{\ell-1} \geq 2 and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Suppose that canDec(ct^i)M^\|\mathsf{can} \circ \mathsf{Dec}(\hat{\mathsf{ct}}_i)\|_\infty \leq \hat{M} and canDec(ctˇi)<Mˇ\|\mathsf{can} \circ \mathsf{Dec}(\check{\mathsf{ct}}_i)\|_\infty < \check{M} for i{1,2}i \in \{1, 2\} and for some M^,Mˇ\hat{M}, \check{M}. Let CTMult=Mult2(CT1,CT2)=(ct^Mult,ctˇMult)\mathsf{CT}_{\mathsf{Mult}} = \mathsf{Mult}^2(\mathsf{CT}_1, \mathsf{CT}_2) = (\hat{\mathsf{ct}}_{\mathsf{Mult}}, \check{\mathsf{ct}}_{\mathsf{Mult}}). Then the following holds: canDec(ctˇMult)2M^Mˇq+N(ERelinq+(h+3)(qdiv+1)). \|\mathsf{can} \circ \mathsf{Dec}(\check{\mathsf{ct}}_{\mathsf{Mult}})\|_\infty \leq \frac{2\hat{M}\check{M}}{q_\ell} + N\left(\frac{E_{\mathrm{Relin}}}{q_\ell} + (h+3)(q_{\mathrm{div}}+1)\right). Here:

  • M^\hat{M} and Mˇ\check{M} are upper bounds on the canonical embedding norms of the high and low parts, respectively.
  • The theorem bounds the norm of the low part of the output ciphertext pair after a Mult² operation.
  • The dominant term 2M^Mˇ/q2\hat{M}\check{M}/q_\ell suggests that if Δqdivq\Delta \approx q_{\mathrm{div}} q_\ell, and we set M^Δ/qdiv\hat{M} \approx \Delta/q_{\mathrm{div}}, then this term is approximately 2Mˇ2\check{M}. This implies the low part grows by 1\approx 1 bit per multiplication.
  • If the low part grows too large, the error term from Tensor² (NMˇ2/(qdivq)N \check{M}^2 / (q_{\mathrm{div}} q_\ell)) can become significant. To mitigate this, a recombine and decompose step (DCP \circ RCB) can be performed periodically to reset the low part to its initial small size, similar to a modular reduction.

4.2.6. Homomorphic Multi-Precision Multiplication (Multᵗ)

The Mult² approach is generalized to Multᵗ for any tuple length t2t \geq 2.

Definition 5.1 (Tuple Tensor). Let CT1=(ct^1,0,,ct^1,t1)\mathsf{CT}_1 = (\hat{\mathsf{ct}}_{1,0}, \ldots, \hat{\mathsf{ct}}_{1,t-1}), CT2=(ct^2,0,,ct^2,t1)(RQ2)t\mathsf{CT}_2 = (\hat{\mathsf{ct}}_{2,0}, \ldots, \hat{\mathsf{ct}}_{2,t-1}) \in (R_{Q_\ell}^2)^t be ciphertext tuples. The tensor of CT1\mathsf{CT}_1 and CT2\mathsf{CT}_2 is defined as ct^ten,i=j=0ict^1,jct^2,ij,for all i{0,,t1}. \hat{\mathsf{ct}}_{\mathsf{ten},i} = \sum_{j=0}^i \hat{\mathsf{ct}}_{1,j} \otimes \hat{\mathsf{ct}}_{2,i-j}, \quad \text{for all } i \in \{0, \ldots, t-1\}. The notation used in the paper is slightly unclear in this section (missing the overall tuple structure of the output), but it's meant to convey that the ii-th component of the output tuple Tensort(CT1,CT2)\mathsf{Tensor}^t(\mathsf{CT}_1, \mathsf{CT}_2) is ct^ten,i\hat{\mathsf{ct}}_{\mathsf{ten},i}. The higher-order terms (products of components whose indices sum to t\geq t) are discarded, similar to Mult². This corresponds to multiplying two polynomials m^1,jXj\sum \hat{m}_{1,j} X^j and m^2,jXj\sum \hat{m}_{2,j} X^j and only keeping terms up to Xt1X^{t-1}.

Lemma 5.2. Let CTi=(ct^i,0,,ct^i,t1)(RQ2)t\mathsf{CT}_i = (\hat{\mathsf{ct}}_{i,0}, \ldots, \hat{\mathsf{ct}}_{i,t-1}) \in (R_{Q_\ell}^2)^t be a ciphertext tuple satisfying RCBqdivt(CTi)=cti\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}_i) = \mathsf{ct}_i for i{1,2}i \in \{1, 2\}. Let sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key. Then, modulo QQ_\ell: (ct1ct2)(1,s,s2)=qdivt1(RCBqdivt(CT1tCT2))(1,s,s2)+i=1t1j=it1qdivt1i(ct^1,jsk)(ct^2,t1jsk). (\mathsf{ct}_1 \otimes \mathsf{ct}_2) \cdot (1, s, s^2) = q_{\mathrm{div}}^{t-1} \cdot (\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}_1 \otimes^t \mathsf{CT}_2)) \cdot (1, s, s^2) + \sum_{i=1}^{t-1} \sum_{j=i}^{t-1} q_{\mathrm{div}}^{t-1-i} (\hat{\mathsf{ct}}_{1,j} \cdot \mathsf{sk}) \cdot (\hat{\mathsf{ct}}_{2,t-1-j} \cdot \mathsf{sk}). Now, assume that Dec(ct^i,0)M^\|\mathsf{Dec}(\hat{\mathsf{ct}}_{i,0})\|_\infty \leq \hat{M} and Dec(ct^i,j)Mˇ\|\mathsf{Dec}(\hat{\mathsf{ct}}_{i,j})\|_\infty \leq \check{M} for all i{1,2},j{1,,t1}i \in \{1, 2\}, j \in \{1, \ldots, t-1\} and for some M^,Mˇ\hat{M}, \check{M} satisfying N(M^qdivt1+Mˇqdivt2++Mˇ)2<Q/2N(\hat{M} q_{\mathrm{div}}^{t-1} + \check{M} q_{\mathrm{div}}^{t-2} + \ldots + \check{M})^2 < Q_\ell/2. Then we have: [RCBqdivt(CT1tCT2)(1,s,s2)]Q1qdivt1[(ct1ct2)(1,s,s2)]Qi=1t1(ti)NMˇ2qdivi. \Big\| \left[ \mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}_1 \otimes^t \mathsf{CT}_2) \cdot (1, s, s^2) \right]_{Q_\ell} - \frac{1}{q_{\mathrm{div}}^{t-1}} \left[ (\mathsf{ct}_1 \otimes \mathsf{ct}_2) \cdot (1, s, s^2) \right]_{Q_\ell} \Big\|_\infty \leq \sum_{i=1}^{t-1} \frac{(t-i)N\check{M}^2}{q_{\mathrm{div}}^i}. This lemma generalizes Lemma 4.2, showing that TensortTensor^t approximates the scaled product of the recombined plaintexts, with an error proportional to the sum of discarded terms.

Definition 5.3 (Tuple Relinearize). Let CT=(ct^0,,ct^t1)(RQ3)t\mathsf{CT} = (\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{t-1}) \in (R_{Q_\ell}^3)^t be a ciphertext tuple. The relinearization of CT\mathsf{CT} is defined recursively as Relint(CT)=DCPqdivt(Relin(qdivt1ct^0))+(0,Relint1(CT)), \mathsf{Relin}^t(\mathsf{CT}) = \mathsf{DCP}_{q_{\mathrm{div}}}^t(\mathsf{Relin}(q_{\mathrm{div}}^{t-1} \cdot \hat{\mathsf{ct}}_0)) + (0, \mathsf{Relin}^{t-1}(\overline{\mathsf{CT}})), where CT=(ct^1,,ct^t1)(RQ3)t1\overline{\mathsf{CT}} = (\hat{\mathsf{ct}}_1, \ldots, \hat{\mathsf{ct}}_{t-1}) \in (R_{Q_\ell}^3)^{t-1}. This is a recursive definition, where the highest order component is processed with a large scaling factor, relinearized, and then decomposed into a t-tuple. The rest of the tuple is processed recursively.

Lemma 5.4. Let CT=(ct^0,,ct^t1)(RQ3)t\mathsf{CT} = (\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{t-1}) \in (R_{Q_\ell}^3)^t be an output of TensortTensor^t. Let sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Then the quantity [(RCBqdivt(Relint(CT)))(1,s)(RCBqdivt(CT))(1,s,s2)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{Relin}^t(\mathsf{CT}))\right) \cdot (1, s) - \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT})\right) \cdot (1, s, s^2) \right]_{Q_\ell} has infinity norm ιERelin+th/2\iota \leq E_{\mathrm{Relin}} + th/2. Now, assume that [RCBqdivt(CT)(1,s,s2)]QM\|\left[\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}) \cdot (1, s, s^2)\right]_{Q_\ell}\|_\infty \leq M for some MM satisfying 2(M+ERelin+th/2)<Q/22(M + E_{\mathrm{Relin}} + th/2) < Q_\ell/2. Then [(RCBqdivt(Relint(CT)))(1,s)]Q[(RCBqdivt(CT))(1,s,s2)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{Relin}^t(\mathsf{CT}))\right) \cdot (1, s) \right]_{Q_\ell} - \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT})\right) \cdot (1, s, s^2) \right]_{Q_\ell} also has infinity norm ιERelin+th/2\iota \leq E_{\mathrm{Relin}} + th/2. This lemma extends the correctness of Relin² to RelintRelin^t, showing that the error bound grows linearly with tt.

Definition 5.5 (Tuple Rescale). Let CT=(ct^0,,ct^t1)(RQ2)t\mathsf{CT} = (\hat{\mathsf{ct}}_0, \ldots, \hat{\mathsf{ct}}_{t-1}) \in (R_{Q_\ell}^2)^t be a ciphertext tuple. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1}. The rescale of CT\mathsf{CT} is defined as RSqt(CT)=(ct^rs,0,,ct^rs,t1)(RQ12)t\mathsf{RS}_{q_\ell}^t(\mathsf{CT}) = (\hat{\mathsf{ct}}_{\mathsf{rs},0}, \ldots, \hat{\mathsf{ct}}_{\mathsf{rs},t-1}) \in (R_{Q_{\ell-1}}^2)^t with ct^rs,0=RSq(ct^0)\hat{\mathsf{ct}}_{\mathsf{rs},0} = \mathsf{RS}_{q_\ell}(\hat{\mathsf{ct}}_0) and, for i{1,2,,t1}i \in \{1, 2, \ldots, t-1\}, ct^rs,i=RSq(qdivict^0+qdivi1ct^1++ct^i)qdivRSq(qdivi1ct^0+qdivi2ct^1++ct^i1). \hat{\mathsf{ct}}_{\mathsf{rs},i} = \mathsf{RS}_{q_\ell}(q_{\mathrm{div}}^i \cdot \hat{\mathsf{ct}}_0 + q_{\mathrm{div}}^{i-1} \cdot \hat{\mathsf{ct}}_1 + \ldots + \hat{\mathsf{ct}}_i) - q_{\mathrm{div}} \cdot \mathsf{RS}_{q_\ell}(q_{\mathrm{div}}^{i-1} \cdot \hat{\mathsf{ct}}_0 + q_{\mathrm{div}}^{i-2} \cdot \hat{\mathsf{ct}}_1 + \ldots + \hat{\mathsf{ct}}_{i-1}). This definition ensures that: RCBqdivt(RSqt(CT))=RSq(RCBqdivt(CT)). \mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{RS}_{q_\ell}^t(\mathsf{CT})) = \mathsf{RS}_{q_\ell}(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT})). This generalizes RS² to RStRS^t, ensuring that the t-tuple is correctly rescaled while only consuming qq_\ell from the modulus chain.

Lemma 5.6. Let CT(RQ2)t\mathsf{CT} \in (R_{Q_\ell}^2)^t be a ciphertext tuple. Let sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Then the quantity [(RCBqdivt(RSqt(CT)))(1,s)]Q11q[(RCBqdivt(CT))(1,s)]Q \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{RS}_{q_\ell}^t(\mathsf{CT}))\right) \cdot (1, s) \right]_{Q_{\ell-1}} - \frac{1}{q_\ell} \left[ \left(\mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT})\right) \cdot (1, s) \right]_{Q_\ell} has infinity norm (h+1)/2\leq (h+1)/2. This confirms the correctness of RStRS^t, showing it correctly rescales the underlying plaintext with a small, bounded error.

Definition 5.7 (Tuple Multiply - Multᵗ). We define multiplication of ciphertext tuples over QQ_\ell as Multt:=RSqtRelintTensort\mathsf{Mult}^t := \mathsf{RS}_{q_\ell}^t \circ \mathsf{Relin}^t \circ \mathsf{Tensor}^t, where q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1}. The result is a ciphertext pair over modulus Q1Q_{\ell-1}. This completes the definition of the MulttMult^t operation.

Theorem 5.8. Let CTi=(ct^i,0,,ct^i,t1)(RQ2)t\mathsf{CT}_i = (\hat{\mathsf{ct}}_{i,0}, \ldots, \hat{\mathsf{ct}}_{i,t-1}) \in (R_{Q_\ell}^2)^t be a ciphertext tuple for i{1,2}i \in \{1, 2\}. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1} and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Assume that Dec(ct^i,0)M^\|\mathsf{Dec}(\hat{\mathsf{ct}}_{i,0})\|_\infty \leq \hat{M} and Dec(ct^i,j)Mˇ\|\mathsf{Dec}(\hat{\mathsf{ct}}_{i,j})\|_\infty \leq \check{M} for i{1,2},j{1,,t1}i \in \{1, 2\}, j \in \{1, \ldots, t-1\} and for some M^,Mˇ\hat{M}, \check{M} satisfying N(M^qdivt1+Mˇqdivt2++Mˇ)2+ERelin+th/2<Q/2N(\hat{M} q_{\mathrm{div}}^{t-1} + \check{M} \cdot q_{\mathrm{div}}^{t-2} + \ldots + \check{M})^2 + E_{\mathrm{Relin}} + th/2 < Q_\ell/2. Then [(RCBqdivt(Multt(CT1,CT2)))sk]Q11q[(RCBqdivt(CT1)sk)(RCBqdivt(CT2)sk)]Q \left[ \left( \mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{Mult}^t(\mathsf{CT}_1, \mathsf{CT}_2)) \right) \cdot \mathsf{sk} \right]_{Q_{\ell-1}} - \frac{1}{q_\ell} \left[ \left( \mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}_1) \cdot \mathsf{sk} \right) \cdot \left( \mathsf{RCB}_{q_{\mathrm{div}}}^t(\mathsf{CT}_2) \cdot \mathsf{sk} \right) \right]_{Q_\ell} has infinity norm (i=1t1(ti)NMˇ2qdivi+ERelin+t2h)1q+h+12. \leq \left( \sum_{i=1}^{t-1} \frac{(t-i)N\check{M}^2}{q_{\mathrm{div}}^i} + E_{\mathrm{Relin}} + \frac{t}{2}h \right) \cdot \frac{1}{q_\ell} + \frac{h+1}{2}. This theorem provides the overall correctness and error bound for MulttMult^t. The main error term is (t1)NMˇ2qdivq\frac{(t-1)N\check{M}^2}{q_{\mathrm{div}} q_\ell}, which grows with tt. For MulttMult^t, the modulus consumption is proportional to qdivt1qq_{\mathrm{div}}^{t-1} q_\ell, and the scaling factor Δ\Delta is chosen to be approximately qdivt1qq_{\mathrm{div}}^{t-1} q_\ell. This allows for tt-multiple precision computations for the same modulus consumption compared to standard CKKS with a much larger Δ\Delta.

4.2.7. Bounding the Low Parts (for Multᵗ)

Similar to Mult², the error in MulttMult^t depends on the norms of its low parts.

Theorem 5.9 (Growth of Low Parts). Let CTi=(ct^i,0,,ct^i,t1)(RQ2)t\mathsf{CT}_i = (\hat{\mathsf{ct}}_{i,0}, \ldots, \hat{\mathsf{ct}}_{i,t-1}) \in (R_{Q_\ell}^2)^t be a ciphertext tuple for i{1,2}i \in \{1, 2\}. Let q=Q/Q1q_\ell = Q_\ell / Q_{\ell-1} and sk=(1,s)R2\mathsf{sk} = (1, s) \in R^2 be a secret key with ss of Hamming weight hh. Suppose that canDec(ct^i,0)M^\|\mathsf{can} \circ \mathsf{Dec}(\hat{\mathsf{ct}}_{i,0})\|_\infty \leq \hat{M} and canDec(ct^i,j)<Mˇ\|\mathsf{can} \circ \mathsf{Dec}(\hat{\mathsf{ct}}_{i,j})\|_\infty < \check{M} for i{1,2},j{1,,t1}i \in \{1, 2\}, j \in \{1, \ldots, t-1\} and for some M^,Mˇ\hat{M}, \check{M}. Let CTMult=Multt(CT1,CT2)=(ct^Mult,0,,ct^Mult,t1)\mathsf{CT}_{\mathsf{Mult}} = \mathsf{Mult}^t(\mathsf{CT}_1, \mathsf{CT}_2) = (\hat{\mathsf{ct}}_{\mathsf{Mult},0}, \ldots, \hat{\mathsf{ct}}_{\mathsf{Mult},t-1}). Then the following holds: canDec(ct^Mult,j)2M^Mˇ+(j1)Mˇ2q+N(ERelinq+j(h+3)(qdiv+1)), \|\mathsf{can} \circ \mathsf{Dec}(\hat{\mathsf{ct}}_{\mathsf{Mult},j})\|_\infty \leq \frac{2\hat{M}\check{M} + (j-1)\check{M}^2}{q_\ell} + N\left(\frac{E_{\mathrm{Relin}}}{q_\ell} + j \cdot (h+3)(q_{\mathrm{div}}+1)\right), where j{1,,t1}j \in \{1, \ldots, t-1\}. Here:

  • This theorem shows the growth of the jj-th low part's norm. The main term is proportional to Mˇ\check{M}, similar to Mult², but the scaling factor for this growth increases with tt.
  • The low parts grow approximately by logt\log t bits after each multiplication. This growth needs to be managed (e.g., via recombine and decompose) to prevent errors from accumulating.

5. Experimental Setup

5.1. Datasets

The paper does not use traditional public datasets (like ImageNet or CIFAR-10) because it focuses on evaluating the fundamental cryptographic properties of the proposed MulttMult^t scheme (error growth, multiplicative depth, precision). Instead, the experiments involve:

  • Single Ciphertext Repeated Squarings: The main method to evaluate error growth and multiplicative depth is performing repeated squaring operations on a single encrypted value. This simulates a deep multiplicative circuit with a single initial input.
  • Synthetic/Controlled Inputs: The nature of the experiments suggests using simple, controlled plaintext values to observe how errors accumulate and how the scheme's parameters affect computational capacity and precision.
  • Purpose: This setup is effective for validating the core cryptographic claims related to noise management, modulus consumption, and precision, rather than benchmarking performance on specific real-world applications.

5.2. Evaluation Metrics

The paper uses several metrics to evaluate the MulttMult^t scheme, particularly in comparison to standard CKKS multiplication (Mult or t=1t=1).

  • Error Magnitude (log2can\log_2 \|\cdot\|_\infty^{\mathrm{can}}):
    1. Conceptual Definition: This measures the magnitude of the error in the decrypted plaintext, specifically the infinity norm of the canonical embedding of the error polynomial. It quantifies how much the computed homomorphic result deviates from the ideal unencrypted result. A smaller error magnitude indicates higher accuracy.
    2. Mathematical Formula: For an error polynomial eRe \in R, the canonical embedding maps it to a complex vector e=can(e)CN/2\mathbf{e} = \mathsf{can}(e) \in \mathbb{C}^{N/2}. The infinity norm is then e=maxjej\|\mathbf{e}\|_\infty = \max_j |e_j|, where eje_j are the components of e\mathbf{e}. The metric used is log2e\log_2 \|\mathbf{e}\|_\infty.
    3. Symbol Explanation:
      • ee: Error polynomial.
      • can\mathsf{can}: Canonical embedding function.
      • e\mathbf{e}: Complex vector resulting from canonical embedding of ee.
      • NN: Ring degree.
      • eje_j: jj-th component (complex number) of the vector e\mathbf{e}.
      • \|\cdot\|_\infty: Infinity norm (maximum absolute value).
      • log2\log_2: Base-2 logarithm, converting the error magnitude into bits.
  • Precision (bits):
    1. Conceptual Definition: Precision refers to the number of accurate bits retained in the plaintext values after homomorphic computations. It's related to the scaling factor and the noise level. Higher precision implies more significant digits can be reliably represented.
    2. Mathematical Formula: Not explicitly given as a formula in the paper, but typically calculated as log2(Δ)log2(error magnitude)\log_2(\Delta) - \log_2(\text{error magnitude}), or simply evaluated based on the error magnitude relative to the expected plaintext values. The paper reports precision in bits (e.g., -31.3 bits of precision).
    3. Symbol Explanation:
      • Δ\Delta: Scaling factor.
      • error magnitude\text{error magnitude}: The magnitude of the error in the decrypted value.
  • Multiplicative Depth:
    1. Conceptual Definition: The maximum number of sequential homomorphic multiplications that can be performed before bootstrapping is required or before the noise level becomes too high for reliable decryption. A higher multiplicative depth is desirable as it allows for more complex computations.
    2. Mathematical Formula: The paper provides an approximation formula for Mult² when error refreshing is used: $ \frac{\log_2(Q)}{\log_2(\Delta) - (1-1/k) \cdot \log_2(q_{\mathrm{div}})} $ For original Mult (no error refreshing, k=1k=1 effectively), this simplifies to log2(Q)/log2(Δ)\log_2(Q) / \log_2(\Delta).
    3. Symbol Explanation:
      • QQ: Total ciphertext modulus available.
      • Δ\Delta: Scaling factor.
      • kk: Number of sequential Mult² multiplications between two consecutive error refreshings (DCP \circ RCB).
      • qdivq_{\mathrm{div}}: The divisor used in DCP.
  • Maximal Modulus (log2(QLP)\log_2(Q_L P)):
    1. Conceptual Definition: The total bit-size of the largest modulus used in the CKKS scheme, combining the final modulus QLQ_L and the auxiliary modulus PP used for key switching. This directly impacts the ring dimension needed for security and the overall memory footprint.
    2. Mathematical Formula: log2(QLP)\log_2(Q_L P).
    3. Symbol Explanation:
      • QLQ_L: The largest modulus in the modulus chain.
      • PP: Auxiliary modulus used for key switching.
      • log2\log_2: Base-2 logarithm, representing the size in bits.
  • Latency (TmultT_{\mathrm{mult}}, ms):
    1. Conceptual Definition: The time taken to perform a single homomorphic multiplication operation.
    2. Mathematical Formula: Measured in milliseconds (ms).
    3. Symbol Explanation: Not applicable (direct measurement).
  • Ciphertext Size (MB):
    1. Conceptual Definition: The memory occupied by an encrypted ciphertext, particularly at the highest modulus level. This impacts storage and communication costs.
    2. Mathematical Formula: Measured in megabytes (MB).
    3. Symbol Explanation: Not applicable (direct measurement).
  • Switching Key Size (MB):
    1. Conceptual Definition: The memory occupied by the switching keys (including relinearization keys and rotation keys), which are necessary for relinearization and rotation operations. Large switching keys can be a significant practical burden.
    2. Mathematical Formula: Measured in megabytes (MB).
    3. Symbol Explanation: Not applicable (direct measurement).
  • Number of NTTs (#NTT):
    1. Conceptual Definition: The count of Number Theoretic Transforms (NTTs) performed. NTTs are fundamental, computationally intensive operations in RLWE-based schemes. This metric indicates the computational complexity.
    2. Mathematical Formula: Count of 64-bit unit NTTs in dimension NN.
    3. Symbol Explanation:
      • NTT: Number Theoretic Transform.
      • NN: Ring degree.

5.3. Baselines

The primary baseline for comparison throughout the paper is the:

  • Ordinary CKKS Multiplication Algorithm (Mult): This refers to the standard CKKS homomorphic multiplication procedure as described in Section 2.2 of the paper. It is equivalent to MulttMult^t with t=1t=1 (i.e., no decomposition is applied). This baseline is representative because it's the standard method that the proposed MulttMult^t algorithm aims to improve upon. The paper uses this baseline to demonstrate the advantages of MulttMult^t in terms of modulus consumption, multiplicative depth, precision, and overall efficiency (latency, ciphertext size, key size).

    The paper also briefly compares MulttMult^t to:

  • Rank-2 Module-CKKS: This is an alternative approach to reduce ring dimension for CKKS by using module-LWE formats. It serves as a baseline for comparing different modulus saving strategies beyond the standard CKKS scheme.

6. Results & Analysis

6.1. Core Results Analysis

The experimental results strongly validate the effectiveness of the proposed MulttMult^t scheme, particularly Mult², in reducing modulus consumption, increasing homomorphic capacity (multiplicative depth), and enabling higher precision with smaller cryptographic parameters.

1. Error Growth Observation: Figure 1 illustrates the error growth during 8 repeated squarings for both standard CKKS (Mult, t=1t=1) and Mult² (t=2t=2). The image shows that the error e1e_1 for Mult grows steadily. For Mult², the error e2e_2 initially follows e1e_1. However, the discarded term ct_check otimes ct_check (the product of low parts) grows faster, approximately 1.7 bits per multiplication. e2e2 (the total error for Mult²) is essentially the sum of e1e1 and ct_check otimes ct_check. Consequently, after about 6 multiplications, e2e2 starts to increase at an accelerated pace compared to e1e1. This confirms the theoretical analysis that dropping the low part product introduces a specific error component, which becomes dominant after several multiplications. This observation justifies the need for a "recombine and decompose" strategy to refresh the low part and prevent excessive error accumulation.

The following are the results from Table 1 of the original paper:

Multiplication algorithm N log2(QLP) log2 Δ log2 q log2 P
Base Mult Div
Mult (t=1) 215 610 61 61 61×8 61
Mult2 (t=2, new) 215 449 61 61 38×8 23 61

As can be seen from the parameters in Table 1, for the same ring degree (N=215N=2^{15}) and scaling factor (log2Δ=61\log_2 \Delta = 61 bits), Mult² achieves 8 multiplications with a significantly smaller maximal modulus (log2(QLP)=449\log_2(Q_L P) = 449 bits) compared to Mult (610 bits). This demonstrates the modulus saving capability of Mult². The Mult primes are also smaller for Mult² (38 bits vs. 61 bits for Mult).

2. Increased Homomorphic Capacity: The paper evaluates the maximum multiplicative depth for a fixed ring degree (N=215N=2^{15}).

The following are the results from Table 2 of the original paper:

  <2 rowspan="2">log<sub>2</sub> Δ</td>
  <td colspan="3">log<sub>2</sub>q</td>
  <td rowspan="2">log<sub>2</sub> P</td>
</tr>
<tr>
  <td>Base</td>
  <td>Mult</td>
  <td>Div</td>
</tr>
Mult. algorithm N log2(QLP)
Mult (t=1) 215 855 57 57 57×13 57
Mult2 (t=2, new) 215 875 61 61 38×18 23×3 61
Table 2 clearly shows that `Mult²` (with t=2t=2) significantly increases the `multiplicative depth` to 18 sequential multiplications, compared to only 13 for `Mult` (with t=1t=1), while maintaining a similar `security level` (128 bits) and comparable `maximal modulus` sizes. The `Mult²` scheme manages the error growth by employing `error refreshing` (recombine and decompose) after 6 and 12 multiplication layers, which explains the use of "23x3" for `Div` primes (one for initial `DCP`, two for refreshing steps). This demonstrates that `Mult²` can enable much deeper circuits for the same `ring degree`.

3. Increased Precision with Smaller Parameters: This experiment focuses on achieving high precision (100-bit scaling factor) for 8 multiplication levels.

The following are the results from Table 3 of the original paper:

Mult.algorithm N dnum Tmult #NTT ctxt size key size
Mult (t=1) 216 9 270ms 290 14.8MB 73.7MB
Mult2 (t=2, new) 215 11 179ms 350 5.08MB 30.6MB
log2(QP) log2 Δ log2 q log2 P
Base Mult Div
1,000 100 (50×2) (50×2)×8 - (50×2)
680 100 (50×2) 60×8 40 60

Table 3 presents a remarkable improvement in parameter efficiency and performance. For 8 levels of multiplication and a 100-bit scaling factor:

  • Mult (t=1) requires ring degree N=216N=2^{16} and a maximal modulus of 1,000 bits.

  • Mult² (t=2) can achieve the same with N=215N=2^{15} and a maximal modulus of only 680 bits.

    This reduction in parameters leads to significant practical benefits:

  • Multiplication Latency: Decreased from 270 ms to 179 ms (1.5x faster).

  • Ciphertext Size: Reduced from 14.8 MB to 5.08 MB (2.9x smaller).

  • Switching Key Size: Dramatically cut from 73.7 MB to 30.6 MB (2.4x smaller).

    These gains highlight that Mult² allows achieving high precision HE with much smaller and more efficient cryptographic parameters. The dnum (key switching gadget rank) for Mult² is slightly higher (11 vs. 9), and it involves more NTTs (350 vs. 290) in total, but the NTT dimension for Mult² is half (N=215N=2^{15}) that of Mult (N=216N=2^{16}), making Mult² faster overall due to the exponential cost of NTTs with increasing NN.

6.2. Data Presentation (Tables)

The following are the results from Table 1 of the original paper:

Multiplication algorithm N log2(QLP) log2 Δ log2 q log2 P
Base Mult Div
Mult (t=1) 215 610 61 61 61×8 61
Mult2 (t=2, new) 215 449 61 61 38×8 23 61

The following are the results from Table 2 of the original paper:

Mult. algorithm N log2(QLP) log2 Δ log2q log2 P
Base Mult Div
Mult (t=1) 215 855 57 57 57×13 57
Mult2 (t=2, new) 215 875 61 61 38×18 23×3 61

The following are the results from Table 3 of the original paper:

Mult.algorithm N dnum Tmult #NTT ctxt size key size
Mult (t=1) 216 9 270ms 290 14.8MB 73.7MB
Mult2 (t=2, new) 215 11 179ms 350 5.08MB 30.6MB
log2(QP) log2 Δ log2 q log2 P
Base Mult Div
1,000 100 (50×2) (50×2)×8 - (50×2)
680 100 (50×2) 60×8 40 60

6.3. Ablation Studies / Parameter Analysis

While the paper does not present explicit "ablation studies" in the traditional sense of removing components, it implicitly performs parameter analysis and demonstrates the necessity of certain mechanisms:

  • Role of q_div and kk: The discussion on the "Growth of the Low Part" (Section 4.3) and the derived formula for kk (number of sequential Mult² multiplications between error refreshings) constitutes a form of parameter analysis. It explains how q_div (the divisor for decomposition) needs to be chosen relative to Δ\Delta (the scaling factor) to manage the error term NMˇ2/(qdivq)N\check{M}^2/(q_{\mathrm{div}}q_\ell). For example, choosing qdivΔq_{\mathrm{div}} \approx \sqrt{\Delta} is suggested for optimal multiplicative depth improvement.

  • Error Refreshing Strategy (DCP \circ RCB): The experimental results, particularly Figure 1, illustrate that the error component from the low part grows significantly after about 6 multiplications. This empirical finding motivates the explicit use of recombine and decompose (DCP \circ RCB) at specific intervals (e.g., after 6 and 12 layers in the 18-depth experiment) to "reset" the low part and maintain precision. This demonstrates that while Mult² inherently saves modulus, it requires an explicit strategy to control accumulated approximation errors over many multiplications.

  • Comparison of t=1t=1 vs. t=2t=2 Parameter Sets: The entire experimental section is a comparative parameter analysis between standard CKKS (t=1t=1) and Double-CKKS (t=2t=2). By designing parameter sets with similar security and precision targets, and observing differences in modulus size, ring degree, multiplicative depth, and performance metrics, the authors effectively demonstrate the impact of their core innovation (increasing tt).

    These analyses show how different choices of cryptographic parameters and the strategic use of error refreshing impact the performance and accuracy of MulttMult^t, confirming the theoretical predictions and guiding practical implementation.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces MulttMult^t, a novel homomorphic multiplication method for the CKKS scheme that significantly reduces modulus consumption by decomposing ciphertexts into tt components. For Mult² (double precision), modulus consumption is almost halved. This is achieved through a weak form of homomorphic Euclidean division and carefully redesigned Tensor, Relinearization, and Rescaling operations that selectively discard low-significance terms. The primary benefits include enabling deeper computational circuits without increasing ciphertext dimension or bootstrapping frequency, and achieving higher precision with smaller cryptographic parameters. Experimental results showcase substantial gains: Mult² increased multiplicative depth from 13 to 18 and allowed for 8 sequential multiplications with a 100-bit scaling factor using a smaller ring degree (N=215N=2^{15}) and a much smaller maximal modulus (680 bits vs. 1000 bits for standard CKKS), leading to 1.5x faster multiplication and significantly smaller ciphertext and key sizes.

7.2. Limitations & Future Work

The authors acknowledge several limitations and propose future research directions:

  • Implementation of Multᵗ for t>2t > 2: The current implementation focuses on Double-CKKS (t=2t=2). The authors note that implementing Tuple-CKKS for t>2t > 2 is non-trivial, particularly concerning arithmetic modulo qdivt1q_{\mathrm{div}}^{t-1} and potential modifications to the RNS representation. They leave the experimental aspects and efficiency assessment for larger tt as future work.
  • Tuple Bootstrapping: While the theoretical components for tuple-CKKS bootstrapping exist, an efficient implementation is still needed. This would require careful management of error growth in homomorphic linear transformations and handling different scaling factors during bootstrapping. This is also left as future work.

7.3. Personal Insights & Critique

The paper presents a clever and highly impactful innovation. The core idea of applying multi-precision arithmetic principles to homomorphic ciphertexts is intuitive yet challenging to execute homomorphically. The introduction of homomorphic Euclidean division is particularly elegant, providing a foundational primitive for this approach.

Key Strengths:

  • Direct Impact on Modulus Consumption: Unlike many prior works that optimize auxiliary operations, this paper directly tackles modulus consumption in the fundamental multiplication operation. This is a critical step towards more practical FHE.
  • Parameter Reduction: The ability to reduce ring dimension (NN) and maximal modulus (QLPQ_L P) for a given security level and depth/precision is a major win. Smaller NN and moduli directly translate to faster computation and reduced memory footprint, which are crucial for real-world FHE deployments.
  • Theoretical Rigor and Experimental Validation: The detailed error analysis (especially low part growth) combined with concrete experimental results provides strong support for the proposed schemes.
  • Beginner-Friendly Explanation: The paper does an excellent job of motivating the problem and explaining the core concepts (like integer decomposition) before diving into the homomorphic adaptations, making it accessible.

Potential Issues/Areas for Improvement:

  • Error Management Complexity: While the "recombine and decompose" strategy effectively manages low part error growth, it adds complexity to circuit design. Determining optimal kk (number of multiplications between refreshings) can be non-trivial and application-dependent.
  • Overhead for Larger tt: As tt increases, the number of Tensor calls and other operations in MulttMult^t grows (t(t+1)/2t(t+1)/2 Tensor calls). While the authors claim the asymptotic cost is similar due to smaller moduli/ring dimensions, the constant factors might become significant for very large tt, potentially limiting its practical benefit for extremely high precision. The lack of experimental results for t>2t > 2 leaves this open.
  • Integration with Bootstrapping: The current work primarily focuses on SHE aspects. Fully realizing the potential of MulttMult^t for FHE requires an efficient "tuple bootstrapping" mechanism. If this becomes overly complex or inefficient, it could limit the overall practical gains for arbitrary-depth circuits.

Transferability/Future Value: The concept of homomorphic Euclidean division and multiple-precision arithmetic could be a foundational technique applicable to other HE schemes (e.g., BFV) or for optimizing other complex homomorphic operations where bit-decomposition or precise error management is critical. This paper represents a significant step forward in making CKKS more efficient and practical for a wider range of high-precision and deep-circuit applications.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.