CutQC: Using Small Quantum Computers for Large Quantum Circuit Evaluations
TL;DR Summary
CutQC partitions large quantum circuits into smaller subcircuits executed on small quantum devices, using classical postprocessing to reconstruct outputs, enabling efficient evaluation of circuits beyond current NISQ limits with improved fidelity.
Abstract
Quantum computing (QC) is a new paradigm offering the potential of exponential speedups over classical computing for certain computational problems. Each additional qubit doubles the size of the computational state space available to a QC algorithm. This exponential scaling underlies QC’s power, but today’s Noisy Intermediate-Scale Quantum (NISQ) devices face significant engineering challenges in scalability. The set of quantum circuits that can be reliably run on NISQ devices is limited by their noisy operations and low qubit counts. This paper introduces CutQC, a scalable hybrid computing approach that combines classical computers and quantum computers to enable evaluation of quantum circuits that cannot be run on classical or quantum computers alone. CutQC cuts large quantum circuits into smaller subcircuits, allowing them to be executed on smaller quantum devices. Classical postprocessing can then reconstruct the output of the original circuit. This approach offers significant runtime speedup compared with the only viable current alternative—purely classical simulations—and demonstrates evaluation of quantum circuits that are larger than the limit of QC or classical simulation. Furthermore, in real-system runs, CutQC achieves much higher quantum circuit evaluation fidelity using small prototype quantum computers than the state-of-the-art large NISQ devices achieve. Overall, this hybrid approach allows users to leverage classical and quantum computing resources to evaluate quantum programs far beyond the reach of either one alone.
English Analysis
1. Bibliographic Information
- Title: CutQC: Using Small Quantum Computers for Large Quantum Circuit Evaluations
- Authors: Wei Tang, Teague Tomesh, Martin Suchara, Jeffrey Larson, and Margaret Martonosi.
- Affiliations: The authors are from the Department of Computer Science at Princeton University and the Mathematics and Computer Science Division at Argonne National Laboratory, USA. This indicates a collaboration between a leading academic institution in computer science and a major national research laboratory, combining expertise in computer architecture, quantum computing, and high-performance computing.
- Journal/Conference: The paper was published in the Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '21). ASPLOS is a premier conference in computer architecture, known for its rigorous standards and focus on the intersection of hardware, software, and systems. Publication here signifies the work's importance to the computer systems community, framing quantum computing as a systems-level challenge.
- Publication Year: 2021
- Abstract: The abstract introduces the core problem: current Noisy Intermediate-Scale Quantum (NISQ) computers are limited by low qubit counts and high error rates, restricting the size and reliability of quantum circuits they can run. The paper proposes
CutQC
, a hybrid quantum-classical method.CutQC
partitions large quantum circuits into smaller subcircuits that can be executed on smaller quantum devices. The results from these subcircuits are then combined on a classical computer to reconstruct the output of the original large circuit. The authors claim this approach provides significant speedups over classical simulation, enables the evaluation of circuits beyond the reach of either quantum or classical computers alone, and achieves higher fidelity by using smaller, more reliable quantum devices. - Original Source Link:
/files/papers/68f851995bc53cc775b2202a/paper.pdf
. This is a formally published paper from the ASPLOS '21 conference.
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: The promise of quantum computing is hindered by the practical limitations of current hardware. Today's NISQ devices are small (tens to a few hundred qubits) and error-prone ("noisy"). As the number of qubits on a device increases, the accumulated noise often leads to a rapid decline in the quality (fidelity) of the results, making larger devices paradoxically less useful for certain computations.
- Existing Gaps: There exists a critical gap: many quantum circuits of scientific interest are too large to run on any existing quantum computer, yet they are also too complex to be simulated efficiently on even the most powerful classical supercomputers. Classical state-vector simulation cost scales exponentially with the number of qubits, quickly becoming intractable.
- Innovation:
CutQC
introduces a practical, automated, and scalable solution to bridge this gap. Instead of waiting for perfect, large-scale quantum computers,CutQC
proposes a "divide and conquer" strategy. It leverages the strengths of both computing paradigms: using small, higher-fidelity quantum computers as co-processors to handle the uniquely quantum parts of a problem, and using scalable classical computers for orchestrating the workflow and reconstructing the final solution.
-
Main Contributions / Findings (What):
- Extending Quantum Computing's Reach:
CutQC
allows the execution of quantum circuits that are significantly larger than the physical qubit count of the available quantum hardware. The paper demonstrates running circuits of up to 100 qubits. - Improving Execution Fidelity: By running smaller circuit fragments on smaller, more reliable quantum devices,
CutQC
achieves higher-quality results than running the original (albeit smaller) circuit on a larger, noisier device. The paper reports an average fidelity improvement (measured by loss reduction) of 21% to 47%. - Accelerating Quantum Simulation: Compared to the only viable alternative for large circuits (purely classical simulation),
CutQC
offers dramatic runtime speedups, ranging from 60X to 8600X across different benchmark circuits. This makes the evaluation of previously intractable circuits feasible. - An End-to-End Automated Framework: The paper presents a complete toolchain, including an automatic
MIP Cut Searcher
to find optimal partition points and two scalable post-processing algorithms (Full Definition
andDynamic Definition
) to handle the classical reconstruction.
- Extending Quantum Computing's Reach:
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Quantum Computing (QC): A computing paradigm that uses principles of quantum mechanics. Its fundamental unit is the qubit, which can exist in a combination (superposition) of 0 and 1 states simultaneously. Multiple qubits can be linked through entanglement, creating a complex computational state space that grows exponentially ( for qubits), which is the source of quantum computing's potential power.
- Quantum Circuit: The standard model for writing quantum programs. It consists of a sequence of quantum gates (operations) applied to qubits over time, analogous to a classical logic circuit.
- Noisy Intermediate-Scale Quantum (NISQ) Devices: This term, coined by John Preskill, describes the current era of quantum computers. They are "intermediate-scale" (50-1000 qubits) and "noisy" (prone to errors from environmental factors, imperfect gate operations, and short qubit lifetimes).
- Fidelity: A measure of success for a quantum computation. It quantifies how close the measured output probability distribution is to the ideal, error-free theoretical result. A fidelity of 100% means a perfect outcome.
- State-Vector Simulation: A classical method to simulate a quantum circuit. It involves storing the entire quantum state as a vector of complex numbers and applying matrix operations for each gate. This method is exact but requires memory and computation time that grow exponentially, becoming infeasible around 45-50 qubits.
- Pauli Matrices: A set of four matrices (
I, X, Y, Z
) that are fundamental in quantum mechanics. They form a basis, meaning any single-qubit operation can be expressed as a linear combination of them. They are central to the theory of circuit cutting. - Kronecker Product (): A mathematical operation that combines two matrices into a larger block matrix. In quantum mechanics, it is used to describe the combined state space of two or more separate quantum systems. In
CutQC
, it is the key operation for stitching the results of subcircuits back together.
-
Differentiation: While the underlying theory of decomposing quantum operations was known in theoretical physics,
CutQC
's primary innovation is making it a practical and automated engineering solution. Prior work was largely theoretical.CutQC
distinguishes itself by:- Automatic Cut Finding: It introduces a Mixed-Integer Programming (MIP) formulation to automatically find the optimal cuts that minimize the immense classical post-processing cost. This transforms an ad-hoc process into a systematic optimization problem.
- Scalable Post-processing: It develops concrete algorithms (
FD
andDD
query) and engineering optimizations (parallelism, greedy ordering) to handle the classical reconstruction step, which is the main bottleneck. TheDynamic Definition
method is particularly novel, as it avoids the need to ever construct the full state vector, enabling sampling from circuits far beyond the classical memory limit. - Real-World Demonstration: The paper validates the entire framework on real IBM quantum hardware, proving both the extension of circuit size and, crucially, the improvement in result fidelity.
4. Methodology (Core Technology & Implementation)
The core technology of CutQC
is quantum circuit cutting. This process involves breaking a large circuit into smaller, independent subcircuits, executing them on small quantum computers, and then classically reconstructing the full circuit's output.
4.1 Principles: The Theory of Cutting a Qubit Wire
The ability to "cut" a wire stems from the mathematical principle that an identity operation can be inserted anywhere in a circuit without changing its function. This identity can then be decomposed into a sum of terms. CutQC
uses a decomposition based on the Pauli matrix basis (I, X, Y, Z
).
Cutting a single wire between two parts of a circuit (an "upstream" part and a "downstream" part) is equivalent to replacing the wire with a sum over four specific combinations of measurement and state preparation. As shown in Figure 3, for each of the four Pauli basis elements, the upstream circuit part is appended with a measurement in that basis, and the downstream part is prepended with a corresponding state preparation.
该图像是论文中的示意图,展示了CutQC方法如何将大量子电路切分成多个小子电路以适配小型量子计算机执行。图中包含量子门操作和测量基的表示,以及子电路的标记和对应的测量基,涉及和等旋转门。
-
Figure 3: Procedure to cut one qubit wire. This diagram illustrates the core mechanism. A single qubit wire connecting vertices and is cut. This generates multiple new, smaller circuits. The upstream fragment (ending at ) is prepared in three variations, corresponding to measurements in the (which is the same as ), , and bases. The downstream fragment (starting at ) is prepared in four variations, corresponding to initializations in the states . The final result is reconstructed by summing four pairs of Kronecker products of the outputs of these subcircuits.
The number of subcircuit executions scales with the number of cuts. If wires are cut, it requires evaluating combinations of subcircuit results, which are then summed up classically. This exponential cost in is the primary overhead that
CutQC
seeks to minimize.
4.2 Framework Overview and Automated Cut Finding
CutQC
is an end-to-end framework, as depicted in Figure 5.
该图像是论文中CutQC框架的示意图,展示了从输入量子电路到最终整体电路评估的处理流程。包括MIP切割搜索器寻找最优切割,将电路分割成子电路,在量子设备上执行,并由重构器还原原始电路的概率分布。
- Figure 5: Framework overview of CutQC. The user provides a large quantum circuit. The
MIP Cut Searcher
takes this circuit and user constraints (e.g., max 15 qubits per subcircuit) and finds the optimal locations to cut. This produces a set of smaller subcircuits. These are then executed on available small quantum devices. Finally, theReconstructor
takes the outputs (probability distributions) from all subcircuit runs and classically computes the final probability distribution for the original, large circuit.
The MIP Cut Searcher
(Section 4.1):
This is the "brain" of CutQC
. It formulates the problem of finding the best cuts as a Mixed-Integer Program (MIP), a standard type of optimization problem.
- Goal: To partition the circuit's graph of multi-qubit gates into clusters (subcircuits) such that:
- Each subcircuit is small enough to fit on the target quantum device.
- The number of cuts () between subcircuits is minimized to reduce the classical reconstruction cost.
- Variables:
- : A binary variable that is 1 if gate is assigned to subcircuit , and 0 otherwise.
- : A binary variable that is 1 if wire is cut by subcircuit .
- Constraints: The model includes constraints to ensure every gate is in exactly one subcircuit, the size of each subcircuit (including original qubits and new qubits from cuts) does not exceed the device limit , and the cutting variables are consistent.
- Objective Function: The objective is to minimize the estimated classical reconstruction time, which is dominated by the number of floating-point multiplications. The cost function is:
- : The total number of cuts. The term reflects the exponential cost of cutting.
- : The number of subcircuits.
- : The number of output qubits from subcircuit that contribute to the final result. The product term estimates the size of the intermediate vectors during the sequential Kronecker product reconstruction.
4.3 Classical Postprocessing and Reconstruction
Once the subcircuits are executed, the results must be classically combined. CutQC
offers two methods for this.
1. Full Definition (FD) Query (Section 4.2): This method aims to reconstruct the entire probability distribution of the original -qubit circuit (all probability values). This is only feasible for moderately sized circuits (up to ~35 qubits). The authors introduce several optimizations to make this tractable:
- Greedy Subcircuit Order: The reconstruction involves a series of Kronecker products. By processing smaller subcircuits first, the size of intermediate state vectors is kept minimal for as long as possible.
- Early Termination: If a subcircuit execution yields an all-zero probability vector (due to sampling noise or the circuit's nature), any reconstruction term involving it will be zero and can be skipped.
- Parallel Processing: The reconstruction terms are independent and can be calculated in parallel on a classical compute cluster, allowing for near-perfect scaling with more nodes.
2. Dynamic Definition (DD) Query (Section 4.3): This is a key innovation for handling circuits that are too large for FD query. Instead of computing all probabilities, DD query computes a "binned" or "blurred" version of the distribution.
- Algorithm 1: Dynamic Definition: The algorithm works recursively.
- In the first step, a small subset of the total qubits are chosen as
active
. The rest aremerged
. - The reconstruction is performed only over the
active
qubits, producing a low-resolution distribution where each output "bin" corresponds to a state of theactive
qubits and contains the summed probability of all states within that bin. - For the next recursion, the algorithm "zooms in" on the bin with the highest probability. It fixes the state of some previously
active
qubits and makes some of the previouslymerged
qubits the newactive
qubits. - This process is repeated to progressively refine the probability distribution in regions of interest.
- In the first step, a small subset of the total qubits are chosen as
- Use Cases:
- For algorithms with sparse output (like Grover's search), DD can efficiently pinpoint the few high-probability "solution" states.
- For algorithms with dense output (like random circuits), DD can be used to efficiently sample the distribution and build an approximate landscape without ever storing the full state vector.
5. Experimental Setup
- Backends:
- Classical: A compute cluster with 16 nodes, each with an Intel Xeon CPU and 256 GB of memory, used for post-processing and classical simulations.
- Quantum: Real-world experiments were run on IBM's 5-qubit
Bogota
and 20-qubitJohannesburg
quantum computers. - Simulation: Ideal, noiseless
state-vector simulation
was used to get the ground-truth results for calculating fidelity and for experiments where running on real hardware was not the focus (e.g., testing post-processing scalability).
- Evaluation Metrics:
- Runtime: The wall-clock time required for classical post-processing.
- (Chi-squared) Loss: A statistical measure to quantify the difference between a measured probability distribution and an ideal one. A lower value signifies a better, more faithful execution.
- Conceptual Definition: It computes a normalized sum of squared differences between the observed probabilities () and the expected/ideal probabilities (). It is more sensitive than simple fidelity, especially to errors in low-probability states.
- Mathematical Formula:
- Symbol Explanation:
- : The number of qubits in the circuit.
- : The measured probability of the -th output state from the real hardware execution (either direct or via
CutQC
). - : The ideal probability of the -th output state, obtained from noiseless state-vector simulation.
- The sum is over all possible output states.
- Baselines:
- Purely Classical Simulation: Running a full state-vector simulation of the circuit using IBM's Qiskit. This serves as the runtime baseline.
- Purely Quantum Execution: Running the uncut circuit directly on a large NISQ device (e.g., 20-qubit
Johannesburg
). This serves as the fidelity baseline.
- Benchmarks: The study uses six diverse and representative quantum circuits:
- Supremacy: A 2-D random circuit, known to have a dense output distribution.
- Approximate Quantum Fourier Transform (AQFT): A key subroutine in many quantum algorithms.
- Grover's Search: An algorithm for unstructured search with a sparse output.
- Bernstein-Vazirani (BV): A simple algorithm that also produces a single solution state.
- Adder: A basic quantum arithmetic circuit.
- Hardware Efficient Ansatz (HWEA): A type of circuit used in near-term variational algorithms.
6. Results & Analysis
6.1 Full Definition (FD) Query: Speedup and Scalability
Figure 6 shows the core performance of the FD query mode.
该图像是图表,展示了CutQC在10、15、20和25量子比特量子计算机上的执行效果,横轴为电路规模,纵轴为对数刻度下的后处理运行时间。结果表明CutQC在大多数情况下比传统经典模拟运行速度快50倍至8600倍。
- Figure 6: Postprocessing runtime for FD query. This set of plots shows the runtime of
CutQC
's post-processing for various benchmarks, circuit sizes, and target quantum device sizes (10, 15, 20, 25 qubits). The key takeaway is that the runtime (y-axis, log scale) is almost always orders of magnitude lower than the corresponding classical simulation time (represented by the top line in each plot). - Key Findings:
- Massive Speedup:
CutQC
provides a 60X to 8600X speedup over classical simulation. - Circuit Dependence: The runtime varies significantly by benchmark. Circuits that are more "globally" entangled and harder to cut (like
Supremacy
andAQFT
) incur higher post-processing costs. - Benefit of Larger Devices: Using a larger quantum computer (e.g., 25-qubit vs. 15-qubit) generally reduces the number of cuts needed, lowering the post-processing runtime. However, the returns diminish once the device is "large enough" for an efficient partition.
- Massive Speedup:
6.2 Dynamic Definition (DD) Query: Evaluating Intractable Circuits
DD query is designed for circuits too large for FD. The paper first illustrates its mechanism on small circuits.
该图像是论文中第7图的图表,展示了使用CutQC在3量子比特机器上递归执行4量子比特BV电路时,不同递归层概率分布的变化。随着递归加深,输出态|1111〉的概率逐步聚集,第四次递归时概率达到1,表明该态为解态。
-
Figure 7: DD query on a 4-qubit BV circuit. This demonstrates how DD query finds a single "solution" state. The algorithm recursively "zooms in" on the part of the state space containing the solution. In recursion 1, it determines the solution must start with . By recursion 4, it has pinpointed the exact solution state without ever computing the full 16-state probability vector.
该图像是图表,展示了CutQC在15量子比特机器上执行16至24量子比特基准电路时,不同递归深度下^2值的下降和累计运行时间的增加,其中递归深度与不同电路类型(BV、HWEA、Supremacy)相关联。括号内的表示拟合优度,数值越低越好。
-
Figure 9: Performance of DD query on medium-sized circuits. This figure shows that as the number of DD recursions increases, the loss (solid lines, left axis) steadily decreases, meaning the reconstructed distribution gets closer to the ideal ground truth. Crucially, the cumulative runtime (dotted lines, right axis) remains extremely low compared to a full classical simulation.
The most significant result for DD is its application to circuits far beyond the classical limit.
该图像是图表,展示了CutQC方法在30、40、50和60量子比特量子计算机上执行不同电路时,随着全电路规模变化的后处理运行时间。纵轴为时间(秒),横轴为电路规模,比较了多种算法的表现。
-
Figure 10: DD query for circuits up to 100 qubits. This plot shows the projected runtime for a single DD recursion to evaluate circuits from 30 to 100 qubits, using hypothetical future devices of 30-60 qubits. The runtime remains in the range of seconds to minutes, demonstrating a clear path to evaluating circuits that are completely intractable for classical simulators.
6.3 Fidelity Improvement
A key claim of CutQC
is that it can improve results by using smaller, better-quality quantum computers. Figure 1 sets the motivation by showing how fidelity plummets on larger devices.
该图像是一个柱状图,展示了不同量子计算机设备上运行Bernstein-Vazirani算法时的保真度(正确答案概率)。保真度随着设备规模增长迅速下降,53量子比特的rochester设备几乎没有有效结果。
-
Figure 1: Fidelity degradation on larger IBM devices. This bar chart shows that for the same algorithm (Bernstein-Vazirani), the probability of getting the correct answer drops dramatically as the device size increases. The 53-qubit Rochester device fails to produce any meaningful result.
The paper then shows that
CutQC
can reverse this trend. The following figure (described as Figure 11 in the text) quantifies this improvement.该图像是一个柱状图,展示了不同输入电路规模下,使用CutQC技术对多种基准测试的百分比降低情况,数值越高表示性能越优。
-
Figure 11 (from image
10.jpg
): Fidelity improvement with CutQC. This chart compares the loss of running circuits directly on the 20-qubitJohannesburg
device versus usingCutQC
with the 5-qubitBogota
device. A positive percentage reduction meansCutQC
produced a result closer to the ideal one. For most benchmarks,CutQC
significantly improves the result quality, achieving an average reduction in loss between 21% and 47%. This is a powerful demonstration that for NISQ hardware, "smaller can be better" when combined with a smart hybrid approach.
6.4 Parallel Scalability
The paper demonstrates that the classical post-processing step scales efficiently.
该图像是图表,展示了将 supremacy_grid量子电路映射到15量子比特IBM Melbourne设备后的CutQC后处理加速比。图中蓝线为CutQC加速比,黑线为理想线性扩展,显示运算时间随着计算节点数增加而良好缩放。
- Figure 12: Postprocessing scalability. This plot shows the speedup of the reconstruction step as the number of classical compute nodes increases. The blue line (CutQC speedup) closely follows the black line (perfect linear scaling), indicating that the workload is highly parallelizable and that adding more classical resources directly translates to faster results.
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully presents
CutQC
, a powerful and practical hybrid quantum-classical framework. It convincingly demonstrates that by intelligently partitioning large quantum circuits to run on smaller, higher-fidelity quantum devices, one can (1) evaluate circuits far beyond the size limits of current hardware, (2) achieve significant speedups over purely classical simulation, and (3) obtain higher-quality, less noisy results than by using larger, more error-prone quantum computers directly.CutQC
provides a tangible pathway to extracting value from NISQ-era hardware. -
Limitations & Future Work:
- Exponential Scaling with Cuts: The primary limitation is the scaling of the classical post-processing cost with the number of cuts, . The method is most effective for circuits that have a structure allowing for a small number of cuts. For highly interconnected or "all-to-all" circuits, finding a low-cut partition may be impossible, rendering the method impractical.
- Subcircuit Fidelity: The final reconstructed fidelity is fundamentally limited by the fidelity of the small quantum device used for the subcircuits. While better than larger devices, these are still noisy, and errors can accumulate during reconstruction.
- MIP Solver Complexity: While the MIP solver was fast for the tested benchmarks, its performance on much larger and more complex circuit topologies is an open question.
-
Personal Insights & Critique:
- Pragmatic Impact:
CutQC
is an excellent example of systems-level thinking applied to quantum computing. It shifts the focus from a purely hardware-centric race for more qubits to a more holistic approach of co-designing hardware and software to maximize the utility of existing resources. It is one of the most practical and impactful techniques to emerge for the NISQ era. - A New Execution Model: This work helps establish a new execution model for quantum computing: a hybrid model where quantum devices act as specialized accelerators or co-processors within a larger classical high-performance computing workflow.
- Open Questions: The work opens up several interesting research directions. Could this method be combined with quantum error mitigation techniques applied at the subcircuit level for even better fidelity? How does the choice of decomposition basis (beyond Pauli) affect the final result's robustness to noise? Can the MIP model be extended to be "noise-aware," preferring to cut across connections that are known to be particularly noisy on a given device?
CutQC
provides a strong foundation for this future exploration.
- Pragmatic Impact:
Similar papers
Recommended via semantic vector search.
Variational Quantum Algorithms in the era of Early Fault Tolerance
This work introduces partial quantum error correction, correcting Clifford gates and using magic state injection for rotations, boosting variational quantum algorithm fidelity 9.27× and halving circuit latency, enabling efficient quantum resource use in the Early Fault Tolerance
CaQR: A Compiler-Assisted Approach for Qubit Reuse through Dynamic Circuit
CaQR leverages dynamic circuit features with mid-circuit measurement/reset to enable compiler-guided qubit reuse, reducing qubit count by up to 80% and boosting fidelity by 20%, optimizing quantum resources and performance in real hardware settings.
Discussion
Leave a comment
No comments yet. Start the discussion!