Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption
TL;DR Summary
The paper introduces Mult², which decomposes CKKS ciphertexts for homomorphic weak Euclidean division, halving modulus consumption and enabling multiple precision multiplication, deeper circuit evaluation, and reduced bootstrapping without increasing parameters.
Abstract
Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption Jung Hee Cheon CryptoLab Inc. Seoul National University Seoul, Republic of Korea jhcheon@snu.ac.kr Wonhee Cho Seoul National University Seoul, Republic of Korea wony0404@snu.ac.kr Jaehyung Kim CryptoLab Inc. Seoul, Republic of Korea jaehyungkim@cryptolab.co.kr Damien Stehlé CryptoLab Inc. Lyon, France damien.stehle@cryptolab.co.kr ABSTRACT Homomorphic Encryption (HE) schemes such as BGV, BFV, and CKKS consume some ciphertext modulus for each multiplication. Bootstrapping (BTS) restores the modulus and allows homomorphic computation to continue, but it is time-consuming and requires a significant amount of modulus. For these reasons, decreasing modulus consumption is crucial topic for BGV, BFV and CKKS, on which numerous studies have been conducted. We propose a novel method, called Mult 2 , to perform ciphertext multiplication in the CKKS scheme with lower modulus consump- tion. Mult 2 relies an a new decomposition of a ciphertext into a pair of ciphertexts that homomorphically performs a weak form of Euclidean division. It multiplies two ciphertexts in decomposed formats with hom
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Homomorphic Multiple Precision Multiplication for CKKS and Reduced Modulus Consumption
1.2. Authors
- Jung Hee Cheon (CryptoLab Inc., Seoul National University, Seoul, Republic of Korea)
- Wonhee Cho (Seoul National University, Seoul, Republic of Korea)
- Jaehyung Kim (CryptoLab Inc., Seoul, Republic of Korea)
- Damien Stehlé (CryptoLab Inc., Lyon, France)
1.3. Journal/Conference
The paper is available on ePrint.iacr.org, which is a recognized preprint archive for cryptography research. Papers on ePrint are typically submitted to conferences or journals for peer review. The specific venue for its official publication is not mentioned in the provided text.
1.4. Publication Year
2023
1.5. Abstract
This paper introduces a novel method called Mult² for ciphertext multiplication within the CKKS homomorphic encryption scheme. The core innovation involves decomposing a ciphertext into a pair of ciphertexts, which then homomorphically execute a weak form of Euclidean division. This decomposition facilitates homomorphic double precision multiplication, yielding results approximately equivalent to standard CKKS multiplication but consuming almost half the ciphertext modulus. The method is extended to Multᵗ for any , allowing decomposition into components for multiple precision arithmetic (referred to as double-CKKS and tuple-CKKS). The proposed algorithms enable the evaluation of deeper computational circuits without increasing ciphertext dimension or the frequency of costly bootstrapping operations, or, conversely, reduce the number of bootstrapping operations for a given circuit depth. They also allow for increased precision without requiring larger cryptographic parameters. For instance, Mult² can handle 8 sequential multiplications with a 100-bit scaling factor using only a 680-bit ciphertext modulus, a feat unachievable with conventional CKKS multiplication. Experimental results and theoretical analysis support the efficiency and improved modulus consumption of these schemes.
1.6. Original Source Link
https://eprint.iacr.org/2023/1788.pdf (Status: Preprint)
2. Executive Summary
2.1. Background & Motivation
The CKKS (Cheon-Kim-Kim-Song) homomorphic encryption scheme is widely used for computations on real and complex numbers, crucial for applications like machine learning. A fundamental challenge in CKKS and other homomorphic encryption schemes (BGV, BFV) is modulus consumption. Each homomorphic multiplication operation reduces the available ciphertext modulus. When the modulus becomes too small, further multiplications are impossible, and bootstrapping is required to refresh the modulus. Bootstrapping, however, is computationally expensive and itself consumes significant modulus. This scarcity of modulus is a major bottleneck, often forcing practitioners to use larger ciphertext dimensions or perform more frequent, costly bootstrapping operations. The core problem the paper addresses is this high modulus consumption in CKKS multiplication. Solving this is critical for enabling deeper computational circuits and increasing the practical applicability of FHE. The paper's innovative idea is to reduce modulus consumption by designing a "multiple precision" arithmetic for ciphertexts, inspired by how multi-precision arithmetic works for integers.
2.2. Main Contributions / Findings
The paper's primary contributions are:
- Novel
Mult²Algorithm: Introduction ofMult², a new homomorphic multiplication method forCKKSthat significantly reducesmodulus consumption(by almost half forMult²) compared to standardCKKSmultiplication. This is achieved byhomomorphically decomposinga ciphertext into a pair and performing "double precision" multiplication, which discards low-significance terms to save modulus. - Homomorphic Euclidean Division: The development of a "weak form" of
homomorphic Euclidean divisionthat enables the decomposition of a ciphertext into quotient and remainder ciphertexts. This is a foundational building block for the multiple-precision approach. - Extension to
Multᵗ(Tuple-CKKS): Generalization ofMult²toMultᵗfor any , allowing a ciphertext to be decomposed into components, facilitatingmultiple precision arithmetic(referred to astuple-CKKS). - Increased Homomorphic Capacity: The proposed schemes enable the evaluation of deeper circuits without increasing
ciphertext dimensionor the frequency ofbootstrapping, or, equivalently, reduce the number ofbootstrappingoperations for the same circuit depth. For example,Double-CKKS(Mult²) increased multiplicative depth from 13 to 18 for a given parameter set. - Increased Precision with Smaller Parameters: The methods allow for higher precision in computations without requiring an increase in
cryptographic parameters(e.g.,ring dimension).Mult²enabled 8 sequential multiplications with a 100-bitscaling factorusing a 680-bitciphertext modulusand , which was impossible with standardCKKSrequiring and a 1000-bit modulus. This translates to substantial performance gains: 1.5x faster multiplication, 2.9x smaller ciphertexts, and 2.4x smaller switching keys. - Theoretical Analysis and Experimental Validation: The paper provides detailed theoretical analysis (e.g., error growth bounds,
low partgrowth) and experimental results that support the efficiency and improvedmodulus consumptionof the proposed schemes.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand this paper, a foundational understanding of homomorphic encryption, particularly the CKKS scheme, and related cryptographic and mathematical concepts is essential.
- Homomorphic Encryption (HE):
HEis a form of encryption that allows computations to be performed on encrypted data without decrypting it first. The results of these computations are encrypted and, when decrypted, match the results of operations performed on the unencrypted data. This enables privacy-preserving data processing.- Somewhat Homomorphic Encryption (SHE):
SHEschemes support a limited number of homomorphic operations (e.g., additions and a few multiplications) before noise accumulation makes further computation impossible. - Fully Homomorphic Encryption (FHE):
FHEschemes areSHEschemes augmented with abootstrappingmechanism.Bootstrappingis a technique that refreshes the ciphertext, reducing its noise level and allowing an unlimited number of operations, thus achieving "full" homomorphic capabilities.
- Somewhat Homomorphic Encryption (SHE):
- CKKS Scheme: The
Cheon-Kim-Kim-Song (CKKS)scheme [12] is aSHEscheme designed for approximate arithmetic on real or complex numbers. Unlike integer-basedHEschemes (BGV,BFV),CKKSinherently introduces some noise, making it suitable for applications that can tolerate small errors, such as machine learning.- Modulus Consumption: In
CKKS, the ciphertext is represented modulo a large integer . Each multiplication operation reduces this effective modulus, meaning the ciphertext implicitly operates modulo a smaller . This reduction is referred to asmodulus consumption. Once the modulus becomes too small, the noise in the ciphertext can no longer be managed, and decryption becomes unreliable. - Scaling Factor ():
CKKSrepresents real numbers as fixed-point integers. Messages (real numbers) are multiplied by a largescaling factorand then rounded to integers before encryption. Thisscaling factordetermines the precision of the fixed-point representation. During homomorphic multiplication, thescaling factortypically squares, so arescalingoperation is needed to bring it back to the original , maintaining manageable plaintext sizes and precision. - Bootstrapping: As explained for
FHE,bootstrappinginCKKSis a procedure to refresh a noisy ciphertext and increase its modulus, allowing further homomorphic operations. It involves homomorphically evaluating a polynomial approximation of the modular reduction function. It is computationally intensive and consumes a significant amount of modulus itself.
- Modulus Consumption: In
- Ring Learning With Errors (RLWE):
CKKSis based on theRing Learning With Errors (RLWE)problem, a hard mathematical problem believed to be resistant to quantum attacks. InRLWE-based schemes, computations happen over polynomial rings.- Ring (): The underlying algebraic structure is typically a polynomial ring , where is a power of two. Elements in this ring are polynomials of degree less than .
- Canonical Embedding: The
canonical embedding(can) maps polynomials in to vectors of complex numbers. Specifically, for ,canmapsm(X)to a vector of its evaluations at specific roots of unity. This mapping is crucial for understanding the norms and error growth inCKKS, as it relates polynomial coefficients to the actual complex numbers being approximated. The infinity norm ofcan(m)is often a tighter bound for errors than the infinity norm of 's coefficients.
- Residue Number System (RNS):
RNSis a number representation system that allows large integer arithmetic to be performed in parallel over smaller moduli. InCKKS, the large modulus is typically a product of many smaller primes (e.g., ). Operations modulo are then performed component-wise modulo each , which is much more efficient. - CKKS Operations (Tensor, Relin, RS):
Tensor(Homomorphic Multiplication - before relinearization): When two ciphertexts and are multiplied, their components are multiplied element-wise, resulting in a ciphertext with more components (e.g., from 2 to 3 elements in ). This increases the ciphertext dimension and prepares it forrelinearization.Relinearization (Relin): AfterTensor, the ciphertext's dimension increases.Relinearizationis a key-switching procedure that reduces the ciphertext's dimension back to the original (e.g., from 3 elements to 2 elements), making it compatible for further operations. This process uses arelinearization key (rlk)and involves a modulus .Rescaling (RS): Afterrelinearization, the plaintext values within the ciphertext are effectively scaled up (e.g., by ).Rescalingdivides the ciphertext by a chosen modulus (a prime factor of the current total modulus ) to bring the plaintext back to a manageable scale () and reduces the ciphertext's modulus. This step consumes a prime from the modulus chain.
3.2. Previous Works
The paper contextualizes its contribution by referencing several prior works focused on modulus consumption and CKKS improvements.
- Gadget Decomposition [3, 5, 6, 13, 20, 21]: These techniques aim to optimize
key switchingoperations (used inrelinearizationandbootstrapping) by decomposing secret keys or ciphertexts into smaller "gadget" components.Bit decomposition[5, 6]: Decomposes values into individual bits.Digit decomposition[13]: Decomposes values into larger base digits.RNS-based decomposition[3, 20, 21]: LeveragesRNSproperties for decomposition.- Limitation: While effective for
key switching,gadget decompositiontypically slows downkey switchingby a factor equal to thegadget rank(dnum) and increaseskey sizeby the same factor. The paper highlights that libraries likeSEALuse largednumvalues, leading to very largeswitching keysizes (e.g., 142.6MB for ). The current paper's approach focuses onmodulus consumptionin the multiplication itself rather than justkey switching.
- Modulus Consumption in Specific Operations [2, 7, 8, 21, 24, 31-33]: Other works have focused on reducing
modulus consumptionin specific, computationally intensive operations likelinear transformationsandbootstrapping.- Limitation: These studies typically address large-scale computations like
bootstrappingbut do not directly improve thehomomorphic multiplicationoperation itself, which is the focus of the current paper.
- Limitation: These studies typically address large-scale computations like
- Reducing Ring Dimension (Module-LWE) [6, 39]: An alternative way to save resources is to reduce the
ring dimension.Module-LWEschemes, like those proposed in [6] and [39] forCKKS, use ciphertexts based on modules rather than rings, which can allow for smaller while maintaining security.- Comparison with Module-CKKS (e.g., Rank-2): The paper compares
Mult²withrank-2 module-CKKS.Module-CKKSkeeps the same modulus asCKKSbut uses 3ring elementsper ciphertext (vs. 2 for standardCKKS).Mult²halves the maximum modulus bit-size and uses 4ring elements(a pair of 2-element ciphertexts). The paper arguesMult²results in 33% smaller ciphertexts and lower computational cost (2 relinearizations vs. 3 formodule-CKKS). This advantage grows further fortuple-CKKS(Multᵗ).
- Comparison with Module-CKKS (e.g., Rank-2): The paper compares
3.3. Technological Evolution
The field of Homomorphic Encryption has seen dramatic improvements since Gentry's foundational work in 2009 [17].
- First Generation (Gentry's Scheme): Pioneering but impractical due to high computational costs.
- Second Generation (BGV, BFV, CKKS): Significant performance improvements, making
SHEpractical for certain tasks.BGV[6] andBFV[3] schemes emerged for integer arithmetic.CKKS[12] specifically addressed approximate real/complex number arithmetic, opening doors for applications in machine learning and data science.
- Efficiency Improvements: Subsequent research focused on optimizing various aspects:
RNStechniques [11, 18, 20, 21] for faster modular arithmetic.- Improved
bootstrappingalgorithms [8, 10, 21, 31-33] to extendSHEtoFHEmore efficiently. Key switchingoptimizations (e.g.,gadget decomposition[3, 5, 6, 20, 21]).
- Current Research: The focus is on making
FHEschemes more practical by reducing computational overhead, parameter sizes, and managing noise and modulus consumption more effectively. This paper fits into this lineage by directly tackling themodulus consumptionproblem during the coremultiplicationoperation itself, an area where prior work mainly focused onkey switchingorbootstrapping.
3.4. Differentiation Analysis
The core difference and innovation of this paper's approach, especially Mult², compared to existing methods lie in its direct attack on modulus consumption during the homomorphic multiplication step itself.
- Focus on Multiplication, not just Key Switching or Bootstrapping: Most prior works on
modulus savingconcentrated on optimizingkey switching(e.g.,gadget decomposition) orbootstrapping(e.g., improvedEvalMod,sine series approximation). While these are important, they don't fundamentally alter howmodulusis consumed during the tensor product and rescaling phases of a multiplication. introduces a new paradigm for the multiplication operation itself. - Homomorphic Euclidean Division as a Core Primitive: The concept of
homomorphically decomposinga ciphertext into quotient and remainder parts is novel and forms the basis of themultiple precisionarithmetic. This allows treating parts of the plaintext (high-order and low-order bits) separately and discarding less significant terms. - Reduced Modulus Consumption per Multiplication: By selectively dropping the product of
low parts() during theTensorstep ofMult², the algorithm effectively consumesmodulusat a rate equivalent to multiplying with smallerbit-sizemultiplicands. This is the key mechanism for "almost halving" themodulus consumptionforMult²(from to effective scale consumption). - Double/Multiple Precision Arithmetic: The paper's approach directly enables
double-CKKSandtuple-CKKS, which are analogous todouble precisionormultiple precisionfloating-point arithmetic in traditional computing. This allows managing precision and error growth more granularly, something not directly addressed bygadget decompositionorbootstrappingimprovements. - Parameter Efficiency: The
modulus consumptionreduction directly translates to smaller maximalmoduli() and, consequently, smallerring dimensions() needed to achieve a givensecurity levelandmultiplicative depth. This leads to substantial gains inlatency,ciphertext size, andswitching key size, as demonstrated in the experimental results. This is a more holistic efficiency gain compared to solutions that only reduce one aspect (e.g.,key sizeforgadget decompositionoften comes with a slowdown).
4. Methodology
4.1. Principles
The core idea behind is to adapt the concept of multiple-precision arithmetic from classical integer or fixed-point computation to the homomorphic encryption setting. In traditional arithmetic, to multiply two large numbers, each can be decomposed into smaller bit-size pieces (e.g., high-order and low-order parts). The multiplication is then performed on these pieces, and the results are combined. For fixed-point arithmetic, less significant terms might be discarded if they fall below the desired precision threshold.
Specifically, for two 2k-bit integers and , each can be decomposed into two -bit pieces: . Their product is:
.
If only a precision of is needed, the last term can be dropped. This reduces a 2k-bit multiplication to four -bit multiplications, but effectively only two significant terms need to be combined, keeping the number of "pieces" constant after multiplication.
The challenge in homomorphic encryption is twofold:
-
Homomorphic Decomposition: How to homomorphically decompose a ciphertext (whose underlying plaintext is a large number) into smaller pieces (ciphertexts representing and ). Directly encrypting pre-decomposed parts doesn't work due to noise accumulation.
-
Managing Increasing Pieces: How to perform multiplication on these decomposed ciphertexts without the number of pieces continually growing, and while correctly managing noise and
modulus consumption.The paper addresses these by introducing
homomorphic Euclidean divisionto decompose ciphertexts and by carefully redesigning theTensor,Relinearization, andRescalingoperations for these decomposed ciphertexts to control error growth andmodulus consumption.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Homomorphic Euclidean Division
The concept of Euclidean division is extended from integers to polynomials (which are the form of plaintexts in CKKS). For a polynomial and an integer , its Euclidean division gives a quotient and a remainder:
where is the quotient and [m]_q is the remainder.
The paper defines Euclidean division directly on ciphertexts by applying it component-wise.
Definition 3.1 (Ciphertext Euclidean Division). Let be two integers. Let be a ciphertext. The remainder of by is defined as The quotient of by is defined as Here:
- is a
CKKSciphertext, whereb, aare polynomials in . - is an integer divisor.
[b]_qand[a]_qdenote the coefficient-wise modular reduction of polynomials and modulo . The result is taken as coefficients in .- thus produces a pair of polynomials (regarded as a ciphertext in ) with small coefficients (bounded by ).
- involves subtracting the remainder and then dividing by . This division is
rescalingby , hence the notation . The result is a ciphertext moduloQ/q. The subtraction is performed modulo , and then the division by reduces the modulus from toQ/q.
Theorem 3.2. Let be two integers. Let and be a secret key with of Hamming weight . Let and write with and . We have: for some satisfying . Here:
- is the underlying plaintext polynomial of .
- is the ideal quotient polynomial, is the ideal remainder polynomial.
- The theorem states that decrypts to approximately , and decrypts to approximately .
- The term represents a small error polynomial. The infinity norm bounds the maximum absolute value of its coefficients. Its small size ensures that the
homomorphic Euclidean divisionis accurate. This error arises from therescalingoperation and the inherent noise inCKKS.
4.2.2. Pair Representation (Double-CKKS)
The Pair Representation (Double-CKKS) is a novel way to represent a ciphertext as a pair of ciphertexts, analogous to decomposing a number into high and low parts.
Definition 3.3. Let be an element of the modulus chain (see Section 2.1). Let be an integer. For a ciphertext where , the decomposition of is: Conversely, the recombination of is: Here:
- is a modulus from the
CKKSmodulus chain. - is the integer divisor used for decomposition.
- is the original ciphertext, defined modulo .
DCP(Decompose Ciphertext Pair) takes a ciphertext and produces a pair . is the "high part" (quotient ciphertext), and is the "low part" (remainder ciphertext). Note that is modulo , while is initially but can be viewed as if .RCB(Recombine Ciphertext Pair) takes a pair of ciphertexts and combines them back into a single ciphertext using multiplication by and addition.
Lemma 3.4.
For a pair of ciphertexts and a secret key , we have, modulo :
This lemma states that RCB performs the reverse operation of DCP on the underlying plaintexts, essentially reconstructing the combined plaintext from its high and low parts.
4.2.3. Tuple Representation (Tuple-CKKS)
The Pair Representation is extended to Tuple Representation for components. For an integer (which corresponds to the tuple length ), a ciphertext where can be decomposed into an -tuple.
The decomposition is performed recursively:
-
-
For :
-
Finally, .
The decomposition of is then . Each component is a ciphertext modulo .
Conversely, the recombination of is:
This generalizes the base- representation for polynomials to ciphertexts, effectively creating a t-tuple of ciphertexts, where is the tuple length.
4.2.4. Homomorphic Double-Precision Multiplication (Mult²)
The Mult² algorithm consists of three adapted operations: Tensor², Relin², and RS².
Definition 4.1 (Pair Tensor). Let , be ciphertext pairs. The tensor of and is defined as: Here:
- and are input ciphertext pairs.
- denotes the standard
CKKS Tensoroperation. - The output is a pair of standard
CKKSciphertexts, each of dimension 3 after theTensoroperation. - Crucially, this operation discards the product of the low-part ciphertexts, . This is the main source of
modulus consumptionreduction, as it ignores the least significant terms, introducing a controlled numerical error.
Lemma 4.2. Let be a ciphertext pair and for . Let be a secret key. Then, modulo : Now, assume that and for all and for some satisfying . Then we have: Here:
- The first equation relates the decryption of the standard
Tensorproduct of recombined ciphertexts to theTensor²product of the pairs. It shows thatTensor²effectively computes the higher-order terms of the product, while the term (the product of the low parts) is discarded. - The second inequality bounds the error introduced by this discarding. It shows that
Tensor²applied toCT1andCT2decrypts to approximately the same plaintext asTensorofct1andct2divided by . The error bound highlights the impact of dropping the term, which depends quadratically on the size of the low parts .
Definition 4.3 (Pair Relinearize).
Let be a ciphertext pair (output of Tensor²). The relinearization of is defined as:
Here:
- is the standard
CKKS relinearizationoperation. - Instead of simply relinearizing each component independently, which would cause significant error,
Relin²first multiplies the high-part by , relinearizes it, and thendecomposesthe result (DCP). This "raises the modulus" for the high part before relinearization. The low part is relinearized directly. - This approach avoids
modulus consumptionin theRelinstep ofMult²becauseDCPis applied to a ciphertext already modulated by , and the resulting parts are modulo .
Lemma 4.4.
Let be an output of Tensor² and a secret key with of Hamming weight . Then the quantity
has infinity norm .
Now, assume that for some satisfying . Then the quantity
also has infinity norm .
Here:
- is the error bound from standard
CKKS relinearization. - This lemma shows that
Relin²correctly transforms a ciphertext that decrypts under an extended key to one that decrypts under the standard secret key , with a bounded error. The error bound is similar to standardCKKS relinearization, indicating thatRelin²does not introduce significant additional error.
Definition 4.5 (Pair Rescale). Let be a ciphertext pair. Let . The rescale of is defined as: It belongs to . Here:
- is the standard
CKKS rescaleoperation. RS²also intelligently usesrescalingto maintain thepair representation. The key idea is that the formula ensures: This meansRS²correctly rescales the recombined plaintext, but it does so without consuming more modulus than a standardRS(only is consumed, not an additional ).
Lemma 4.6.
Let be a ciphertext pair. Let . Let be a secret key with of Hamming weight . Then the quantity
has infinity norm .
This lemma confirms the correctness of RS², showing that it rescales the underlying plaintext by with only a small, bounded error, similar to standard CKKS rescaling.
Definition 4.7 (Pair Multiply - Mult²).
We define multiplication of ciphertext pairs over as , where . The result is a ciphertext pair over modulus .
This is the complete definition of the Mult² operation, combining the three adapted steps.
Theorem 4.8. Let , be ciphertext pairs. Let and be a secret key with of Hamming weight . Assume that and for all and for some satisfying . Then has infinity norm Here:
- This theorem provides the overall correctness and error bound for
Mult². - The error term (from ) is the main new component compared to standard
CKKSmultiplication, arising from discarding the product of low parts. - The
modulus consumptionperMult²operation is , which is chosen to be smaller than thescaling factorin standardCKKS(where ). This meansMult²consumes roughly half the modulus per multiplication for the same effective scaling factor.
4.2.5. Bounding the Low Parts (for Mult²)
The error introduced by Tensor² depends quadratically on the norm of the low part plaintext, . Thus, understanding and bounding the growth of is critical for maintaining precision over sequential multiplications. The paper uses the canonical embedding for tighter bounds.
Theorem 4.9 (Growth of the Low Part). Let and be ciphertext pairs. Let and be a secret key with of Hamming weight . Suppose that and for and for some . Let . Then the following holds: Here:
- and are upper bounds on the
canonical embeddingnorms of the high and low parts, respectively. - The theorem bounds the norm of the
low partof the output ciphertext pair after aMult²operation. - The dominant term suggests that if , and we set , then this term is approximately . This implies the
low partgrows by bit per multiplication. - If the
low partgrows too large, the error term fromTensor²() can become significant. To mitigate this, arecombine and decomposestep (DCPRCB) can be performed periodically to reset thelow partto its initial small size, similar to a modular reduction.
4.2.6. Homomorphic Multi-Precision Multiplication (Multᵗ)
The Mult² approach is generalized to Multᵗ for any tuple length .
Definition 5.1 (Tuple Tensor).
Let , be ciphertext tuples. The tensor of and is defined as
The notation used in the paper is slightly unclear in this section (missing the overall tuple structure of the output), but it's meant to convey that the -th component of the output tuple is . The higher-order terms (products of components whose indices sum to ) are discarded, similar to Mult². This corresponds to multiplying two polynomials and and only keeping terms up to .
Lemma 5.2. Let be a ciphertext tuple satisfying for . Let be a secret key. Then, modulo : Now, assume that and for all and for some satisfying . Then we have: This lemma generalizes Lemma 4.2, showing that approximates the scaled product of the recombined plaintexts, with an error proportional to the sum of discarded terms.
Definition 5.3 (Tuple Relinearize).
Let be a ciphertext tuple. The relinearization of is defined recursively as
where .
This is a recursive definition, where the highest order component is processed with a large scaling factor, relinearized, and then decomposed into a t-tuple. The rest of the tuple is processed recursively.
Lemma 5.4.
Let be an output of . Let be a secret key with of Hamming weight . Then the quantity
has infinity norm .
Now, assume that for some satisfying . Then
also has infinity norm .
This lemma extends the correctness of Relin² to , showing that the error bound grows linearly with .
Definition 5.5 (Tuple Rescale).
Let be a ciphertext tuple. Let . The rescale of is defined as with and, for ,
This definition ensures that:
This generalizes RS² to , ensuring that the t-tuple is correctly rescaled while only consuming from the modulus chain.
Lemma 5.6. Let be a ciphertext tuple. Let be a secret key with of Hamming weight . Then the quantity has infinity norm . This confirms the correctness of , showing it correctly rescales the underlying plaintext with a small, bounded error.
Definition 5.7 (Tuple Multiply - Multᵗ).
We define multiplication of ciphertext tuples over as , where . The result is a ciphertext pair over modulus .
This completes the definition of the operation.
Theorem 5.8.
Let be a ciphertext tuple for . Let and be a secret key with of Hamming weight . Assume that and for and for some satisfying . Then
has infinity norm
This theorem provides the overall correctness and error bound for . The main error term is , which grows with . For , the modulus consumption is proportional to , and the scaling factor is chosen to be approximately . This allows for -multiple precision computations for the same modulus consumption compared to standard CKKS with a much larger .
4.2.7. Bounding the Low Parts (for Multᵗ)
Similar to Mult², the error in depends on the norms of its low parts.
Theorem 5.9 (Growth of Low Parts). Let be a ciphertext tuple for . Let and be a secret key with of Hamming weight . Suppose that and for and for some . Let . Then the following holds: where . Here:
- This theorem shows the growth of the -th
low part's norm. The main term is proportional to , similar toMult², but the scaling factor for this growth increases with . - The
low partsgrow approximately by bits after each multiplication. This growth needs to be managed (e.g., viarecombine and decompose) to prevent errors from accumulating.
5. Experimental Setup
5.1. Datasets
The paper does not use traditional public datasets (like ImageNet or CIFAR-10) because it focuses on evaluating the fundamental cryptographic properties of the proposed scheme (error growth, multiplicative depth, precision). Instead, the experiments involve:
- Single Ciphertext Repeated Squarings: The main method to evaluate error growth and multiplicative depth is performing repeated squaring operations on a single encrypted value. This simulates a deep multiplicative circuit with a single initial input.
- Synthetic/Controlled Inputs: The nature of the experiments suggests using simple, controlled plaintext values to observe how errors accumulate and how the scheme's parameters affect computational capacity and precision.
- Purpose: This setup is effective for validating the core cryptographic claims related to noise management, modulus consumption, and precision, rather than benchmarking performance on specific real-world applications.
5.2. Evaluation Metrics
The paper uses several metrics to evaluate the scheme, particularly in comparison to standard CKKS multiplication (Mult or ).
- Error Magnitude ():
- Conceptual Definition: This measures the magnitude of the error in the decrypted plaintext, specifically the infinity norm of the
canonical embeddingof the error polynomial. It quantifies how much the computed homomorphic result deviates from the ideal unencrypted result. A smaller error magnitude indicates higher accuracy. - Mathematical Formula: For an error polynomial , the canonical embedding maps it to a complex vector . The infinity norm is then , where are the components of . The metric used is .
- Symbol Explanation:
- : Error polynomial.
- : Canonical embedding function.
- : Complex vector resulting from canonical embedding of .
- : Ring degree.
- : -th component (complex number) of the vector .
- : Infinity norm (maximum absolute value).
- : Base-2 logarithm, converting the error magnitude into bits.
- Conceptual Definition: This measures the magnitude of the error in the decrypted plaintext, specifically the infinity norm of the
- Precision (bits):
- Conceptual Definition: Precision refers to the number of accurate bits retained in the plaintext values after homomorphic computations. It's related to the
scaling factorand the noise level. Higher precision implies more significant digits can be reliably represented. - Mathematical Formula: Not explicitly given as a formula in the paper, but typically calculated as , or simply evaluated based on the error magnitude relative to the expected plaintext values. The paper reports precision in bits (e.g., -31.3 bits of precision).
- Symbol Explanation:
- : Scaling factor.
- : The magnitude of the error in the decrypted value.
- Conceptual Definition: Precision refers to the number of accurate bits retained in the plaintext values after homomorphic computations. It's related to the
- Multiplicative Depth:
- Conceptual Definition: The maximum number of sequential homomorphic multiplications that can be performed before
bootstrappingis required or before the noise level becomes too high for reliable decryption. A higher multiplicative depth is desirable as it allows for more complex computations. - Mathematical Formula: The paper provides an approximation formula for
Mult²whenerror refreshingis used: $ \frac{\log_2(Q)}{\log_2(\Delta) - (1-1/k) \cdot \log_2(q_{\mathrm{div}})} $ For originalMult(no error refreshing, effectively), this simplifies to . - Symbol Explanation:
- : Total ciphertext modulus available.
- : Scaling factor.
- : Number of sequential
Mult²multiplications between two consecutiveerror refreshings(DCPRCB). - : The divisor used in
DCP.
- Conceptual Definition: The maximum number of sequential homomorphic multiplications that can be performed before
- Maximal Modulus ():
- Conceptual Definition: The total bit-size of the largest modulus used in the
CKKSscheme, combining the final modulus and the auxiliary modulus used forkey switching. This directly impacts thering dimensionneeded for security and the overall memory footprint. - Mathematical Formula: .
- Symbol Explanation:
- : The largest modulus in the
modulus chain. - : Auxiliary modulus used for
key switching. - : Base-2 logarithm, representing the size in bits.
- : The largest modulus in the
- Conceptual Definition: The total bit-size of the largest modulus used in the
- Latency (, ms):
- Conceptual Definition: The time taken to perform a single homomorphic multiplication operation.
- Mathematical Formula: Measured in milliseconds (ms).
- Symbol Explanation: Not applicable (direct measurement).
- Ciphertext Size (MB):
- Conceptual Definition: The memory occupied by an encrypted ciphertext, particularly at the highest
modulus level. This impacts storage and communication costs. - Mathematical Formula: Measured in megabytes (MB).
- Symbol Explanation: Not applicable (direct measurement).
- Conceptual Definition: The memory occupied by an encrypted ciphertext, particularly at the highest
- Switching Key Size (MB):
- Conceptual Definition: The memory occupied by the
switching keys(includingrelinearization keysandrotation keys), which are necessary forrelinearizationandrotationoperations. Largeswitching keyscan be a significant practical burden. - Mathematical Formula: Measured in megabytes (MB).
- Symbol Explanation: Not applicable (direct measurement).
- Conceptual Definition: The memory occupied by the
- Number of NTTs (#NTT):
- Conceptual Definition: The count of
Number Theoretic Transforms (NTTs)performed.NTTsare fundamental, computationally intensive operations inRLWE-based schemes. This metric indicates the computational complexity. - Mathematical Formula: Count of
64-bit unit NTTsin dimension . - Symbol Explanation:
- NTT: Number Theoretic Transform.
- : Ring degree.
- Conceptual Definition: The count of
5.3. Baselines
The primary baseline for comparison throughout the paper is the:
-
Ordinary CKKS Multiplication Algorithm (Mult): This refers to the standard
CKKShomomorphic multiplication procedure as described in Section 2.2 of the paper. It is equivalent to with (i.e., no decomposition is applied). This baseline is representative because it's the standard method that the proposed algorithm aims to improve upon. The paper uses this baseline to demonstrate the advantages of in terms ofmodulus consumption,multiplicative depth,precision, and overall efficiency (latency, ciphertext size, key size).The paper also briefly compares to:
-
Rank-2 Module-CKKS: This is an alternative approach to reduce
ring dimensionforCKKSby usingmodule-LWEformats. It serves as a baseline for comparing differentmodulus savingstrategies beyond the standardCKKSscheme.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results strongly validate the effectiveness of the proposed scheme, particularly Mult², in reducing modulus consumption, increasing homomorphic capacity (multiplicative depth), and enabling higher precision with smaller cryptographic parameters.
1. Error Growth Observation:
Figure 1 illustrates the error growth during 8 repeated squarings for both standard CKKS (Mult, ) and Mult² ().
The image shows that the error for Mult grows steadily. For Mult², the error initially follows . However, the discarded term ct_check otimes ct_check (the product of low parts) grows faster, approximately 1.7 bits per multiplication. (the total error for Mult²) is essentially the sum of and ct_check otimes ct_check. Consequently, after about 6 multiplications, starts to increase at an accelerated pace compared to . This confirms the theoretical analysis that dropping the low part product introduces a specific error component, which becomes dominant after several multiplications. This observation justifies the need for a "recombine and decompose" strategy to refresh the low part and prevent excessive error accumulation.
The following are the results from Table 1 of the original paper:
| Multiplication algorithm | N | log2(QLP) | log2 Δ | log2 q | log2 P | ||
| Base | Mult | Div | |||||
| Mult (t=1) | 215 | 610 | 61 | 61 | 61×8 | 61 | |
| Mult2 (t=2, new) | 215 | 449 | 61 | 61 | 38×8 | 23 | 61 |
As can be seen from the parameters in Table 1, for the same ring degree () and scaling factor ( bits), Mult² achieves 8 multiplications with a significantly smaller maximal modulus ( bits) compared to Mult (610 bits). This demonstrates the modulus saving capability of Mult². The Mult primes are also smaller for Mult² (38 bits vs. 61 bits for Mult).
2. Increased Homomorphic Capacity:
The paper evaluates the maximum multiplicative depth for a fixed ring degree ().
The following are the results from Table 2 of the original paper:
<2 rowspan="2">log<sub>2</sub> Δ</td>
<td colspan="3">log<sub>2</sub>q</td>
<td rowspan="2">log<sub>2</sub> P</td>
</tr>
<tr>
<td>Base</td>
<td>Mult</td>
<td>Div</td>
</tr>
| Mult. algorithm | N | log2(QLP) | |||||
| Mult (t=1) | 215 | 855 | 57 | 57 | 57×13 | 57 | |
| Mult2 (t=2, new) | 215 | 875 | 61 | 61 | 38×18 | 23×3 | 61 |
3. Increased Precision with Smaller Parameters:
This experiment focuses on achieving high precision (100-bit scaling factor) for 8 multiplication levels.
The following are the results from Table 3 of the original paper:
| Mult.algorithm | N | dnum | Tmult | #NTT | ctxt size | key size | ||||
| Mult (t=1) | 216 | 9 | 270ms | 290 | 14.8MB | 73.7MB | ||||
| Mult2 (t=2, new) | 215 | 11 | 179ms | 350 | 5.08MB | 30.6MB | ||||
| log2(QP) | log2 Δ | log2 q | log2 P | |||||||
| Base | Mult | Div | ||||||||
| 1,000 | 100 | (50×2) | (50×2)×8 | - | (50×2) | |||||
| 680 | 100 | (50×2) | 60×8 | 40 | 60 | |||||
Table 3 presents a remarkable improvement in parameter efficiency and performance. For 8 levels of multiplication and a 100-bit scaling factor:
-
Mult(t=1) requiresring degreeand amaximal modulusof 1,000 bits. -
Mult²(t=2) can achieve the same with and amaximal modulusof only 680 bits.This reduction in parameters leads to significant practical benefits:
-
Multiplication Latency: Decreased from 270 ms to 179 ms (1.5x faster).
-
Ciphertext Size: Reduced from 14.8 MB to 5.08 MB (2.9x smaller).
-
Switching Key Size: Dramatically cut from 73.7 MB to 30.6 MB (2.4x smaller).
These gains highlight that
Mult²allows achieving high precisionHEwith much smaller and more efficientcryptographic parameters. Thednum(key switching gadget rank) forMult²is slightly higher (11 vs. 9), and it involves moreNTTs(350 vs. 290) in total, but theNTT dimensionforMult²is half () that ofMult(), makingMult²faster overall due to the exponential cost ofNTTswith increasing .
6.2. Data Presentation (Tables)
The following are the results from Table 1 of the original paper:
| Multiplication algorithm | N | log2(QLP) | log2 Δ | log2 q | log2 P | ||
| Base | Mult | Div | |||||
| Mult (t=1) | 215 | 610 | 61 | 61 | 61×8 | 61 | |
| Mult2 (t=2, new) | 215 | 449 | 61 | 61 | 38×8 | 23 | 61 |
The following are the results from Table 2 of the original paper:
| Mult. algorithm | N | log2(QLP) | log2 Δ | log2q | log2 P | ||
| Base | Mult | Div | |||||
| Mult (t=1) | 215 | 855 | 57 | 57 | 57×13 | 57 | |
| Mult2 (t=2, new) | 215 | 875 | 61 | 61 | 38×18 | 23×3 | 61 |
The following are the results from Table 3 of the original paper:
| Mult.algorithm | N | dnum | Tmult | #NTT | ctxt size | key size | ||||
| Mult (t=1) | 216 | 9 | 270ms | 290 | 14.8MB | 73.7MB | ||||
| Mult2 (t=2, new) | 215 | 11 | 179ms | 350 | 5.08MB | 30.6MB | ||||
| log2(QP) | log2 Δ | log2 q | log2 P | |||||||
| Base | Mult | Div | ||||||||
| 1,000 | 100 | (50×2) | (50×2)×8 | - | (50×2) | |||||
| 680 | 100 | (50×2) | 60×8 | 40 | 60 | |||||
6.3. Ablation Studies / Parameter Analysis
While the paper does not present explicit "ablation studies" in the traditional sense of removing components, it implicitly performs parameter analysis and demonstrates the necessity of certain mechanisms:
-
Role of
q_divand : The discussion on the "Growth of the Low Part" (Section 4.3) and the derived formula for (number of sequentialMult²multiplications betweenerror refreshings) constitutes a form of parameter analysis. It explains howq_div(the divisor for decomposition) needs to be chosen relative to (the scaling factor) to manage the error term . For example, choosing is suggested for optimalmultiplicative depthimprovement. -
Error Refreshing Strategy (
DCPRCB): The experimental results, particularly Figure 1, illustrate that the error component from thelow partgrows significantly after about 6 multiplications. This empirical finding motivates the explicit use ofrecombine and decompose(DCPRCB) at specific intervals (e.g., after 6 and 12 layers in the 18-depth experiment) to "reset" thelow partand maintainprecision. This demonstrates that whileMult²inherently savesmodulus, it requires an explicit strategy to control accumulated approximation errors over many multiplications. -
Comparison of vs. Parameter Sets: The entire experimental section is a comparative parameter analysis between standard
CKKS() andDouble-CKKS(). By designing parameter sets with similar security and precision targets, and observing differences inmodulus size,ring degree,multiplicative depth, and performance metrics, the authors effectively demonstrate the impact of their core innovation (increasing ).These analyses show how different choices of cryptographic parameters and the strategic use of
error refreshingimpact the performance and accuracy of , confirming the theoretical predictions and guiding practical implementation.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces , a novel homomorphic multiplication method for the CKKS scheme that significantly reduces modulus consumption by decomposing ciphertexts into components. For Mult² (double precision), modulus consumption is almost halved. This is achieved through a weak form of homomorphic Euclidean division and carefully redesigned Tensor, Relinearization, and Rescaling operations that selectively discard low-significance terms. The primary benefits include enabling deeper computational circuits without increasing ciphertext dimension or bootstrapping frequency, and achieving higher precision with smaller cryptographic parameters. Experimental results showcase substantial gains: Mult² increased multiplicative depth from 13 to 18 and allowed for 8 sequential multiplications with a 100-bit scaling factor using a smaller ring degree () and a much smaller maximal modulus (680 bits vs. 1000 bits for standard CKKS), leading to 1.5x faster multiplication and significantly smaller ciphertext and key sizes.
7.2. Limitations & Future Work
The authors acknowledge several limitations and propose future research directions:
- Implementation of
Multᵗfor : The current implementation focuses onDouble-CKKS(). The authors note that implementingTuple-CKKSfor is non-trivial, particularly concerning arithmetic modulo and potential modifications to theRNSrepresentation. They leave the experimental aspects and efficiency assessment for larger as future work. - Tuple Bootstrapping: While the theoretical components for
tuple-CKKSbootstrappingexist, an efficient implementation is still needed. This would require careful management oferror growthinhomomorphic linear transformationsand handling differentscaling factorsduringbootstrapping. This is also left as future work.
7.3. Personal Insights & Critique
The paper presents a clever and highly impactful innovation. The core idea of applying multi-precision arithmetic principles to homomorphic ciphertexts is intuitive yet challenging to execute homomorphically. The introduction of homomorphic Euclidean division is particularly elegant, providing a foundational primitive for this approach.
Key Strengths:
- Direct Impact on Modulus Consumption: Unlike many prior works that optimize auxiliary operations, this paper directly tackles
modulus consumptionin the fundamentalmultiplicationoperation. This is a critical step towards more practicalFHE. - Parameter Reduction: The ability to reduce
ring dimension() andmaximal modulus() for a givensecurity levelanddepth/precisionis a major win. Smaller and moduli directly translate to faster computation and reduced memory footprint, which are crucial for real-worldFHEdeployments. - Theoretical Rigor and Experimental Validation: The detailed error analysis (especially
low partgrowth) combined with concrete experimental results provides strong support for the proposed schemes. - Beginner-Friendly Explanation: The paper does an excellent job of motivating the problem and explaining the core concepts (like integer decomposition) before diving into the homomorphic adaptations, making it accessible.
Potential Issues/Areas for Improvement:
- Error Management Complexity: While the "recombine and decompose" strategy effectively manages
low parterror growth, it adds complexity to circuit design. Determining optimal (number of multiplications between refreshings) can be non-trivial and application-dependent. - Overhead for Larger : As increases, the number of
Tensorcalls and other operations in grows (Tensorcalls). While the authors claim the asymptotic cost is similar due to smaller moduli/ring dimensions, the constant factors might become significant for very large , potentially limiting its practical benefit for extremely high precision. The lack of experimental results for leaves this open. - Integration with Bootstrapping: The current work primarily focuses on
SHEaspects. Fully realizing the potential of forFHErequires an efficient "tuple bootstrapping" mechanism. If this becomes overly complex or inefficient, it could limit the overall practical gains for arbitrary-depth circuits.
Transferability/Future Value:
The concept of homomorphic Euclidean division and multiple-precision arithmetic could be a foundational technique applicable to other HE schemes (e.g., BFV) or for optimizing other complex homomorphic operations where bit-decomposition or precise error management is critical. This paper represents a significant step forward in making CKKS more efficient and practical for a wider range of high-precision and deep-circuit applications.
Similar papers
Recommended via semantic vector search.