Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
TL;DR Summary
The study introduces the `Twilight` framework that employs `top-p` sampling for adaptive attention sparsity to accelerate long-context LLMs. Results demonstrate up to 98% token pruning, achieving 15.4x speedup in self-attention and 3.9x in end-to-end latency without compromising
Abstract
Leveraging attention sparsity to accelerate long-context large language models (LLMs) has been a hot research topic. However, current algorithms such as sparse attention or key-value (KV) cache compression tend to use a fixed budget, which presents a significant challenge during deployment because it fails to account for the dynamic nature of real-world scenarios, where the optimal balance between accuracy and efficiency can vary greatly. In this paper, we find that borrowing top- sampling (nucleus sampling) to sparse attention can surprisingly achieve adaptive budgeting. Based on this, we propose Twilight, a framework to bring adaptive sparsity to any existing sparse attention algorithm without sacrificing their accuracy. Empirical results show that Twilight can adaptively prune at most 98% of redundant tokens, leading to acceleration in self-attention operations and acceleration in end-to-end per token latency in long context LLM decoding.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Twilight: Adaptive Attention Sparsity with Hierarchical Top- Pruning
1.2. Authors
Chaofan Lin, Jiaming Tang, Shuo Yang, Hanshuo Wang, Tian Tang, Boyu Tian, Ion Stoica, Song Han, Mingyu Gao.
Affiliations: Tsinghua University, Massachusetts Institute of Technology, University of California, Berkeley.
1.3. Journal/Conference
This paper was published as a preprint on arXiv. The abstract states "Published at (UTC): 2025-02-04T23:26:10.000Z", indicating its public availability. ArXiv is a widely recognized platform for sharing academic research prior to, or in parallel with, formal peer review and publication in journals or conference proceedings. It allows for rapid dissemination of scientific findings.
1.4. Publication Year
2025
1.5. Abstract
The paper addresses the challenge of accelerating long-context large language models (LLMs) by leveraging attention sparsity. Existing sparse attention algorithms and key-value (KV) cache compression methods typically rely on a fixed budget for token selection, which is problematic in real-world deployments due to the dynamic nature of accuracy-efficiency trade-offs. The authors propose that adopting the concept of top-p sampling (also known as nucleus sampling), originally used in LLM generation, can surprisingly achieve adaptive budgeting for sparse attention. Based on this insight, they introduce Twilight, a framework designed to bring adaptive sparsity to any existing sparse attention algorithm without compromising accuracy. Empirical results demonstrate that Twilight can adaptively prune up to 98% of redundant tokens, leading to a 15.4x acceleration in self-attention operations and a 3.9x acceleration in end-to-end per token latency during long context LLM decoding.
1.6. Original Source Link
https://arxiv.org/abs/2502.02770
1.7. PDF Link
https://arxiv.org/pdf/2502.02770v5.pdf
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is the inefficiency of long-context large language models (LLMs), particularly during the decoding stage, due to the high computational and memory costs associated with the attention mechanism. Specifically, the key-value (KV) cache grows rapidly with longer token sequences, leading to significant latency overheads from memory accesses and increased GPU memory consumption.
This problem is highly important in the current field because LLMs with long-context capabilities are crucial for a wide range of advanced applications, including retrieval-based tasks, document summarization, code generation, video language models (VLMs), and large reasoning models that require extensive chain-of-thought (CoT) reasoning. The increasing demand for models supporting context windows up to 1 million or 10 million tokens further highlights the necessity of efficient long-context processing.
A significant challenge in prior research on attention sparsity (or KV cache sparsity) is the reliance on a fixed budget for selecting "critical tokens." This top-k approach presents two major issues:
-
Dynamic Optimal Budget: The ideal number of tokens to select (
budget B) varies dynamically at runtime. Differentattention headsexhibit differentattention weight distributions—some are "focused" (peaked, requiring fewer tokens), while others are "diffuse" (flat, requiring more tokens). A fixed budget leads to eitherover-selection(wasting computation) orunder-selection(losing accuracy). -
Algorithm-Specific Over-selection: Existing sparse attention algorithms require different degrees of over-selection to compensate for inaccuracies in estimating token importance, necessitating tedious offline calibration for each algorithm.
The paper's entry point or innovative idea is to draw an analogy between this
top-kbudget selection dilemma in sparse attention and a similar problem previously encountered inLLM sampling. Just astop-p sampling(also known asnucleus sampling) addresses the limitations oftop-k samplingin generating diverse and coherent text, the authors propose that applyingtop-pprinciples to sparse attention can enable efficient and adaptive budget decisions by directly imposing a threshold on the cumulative sum of attention weights.
2.2. Main Contributions / Findings
The primary contributions of this paper are:
-
Identification of the
top-kbudget dilemma: The paper thoroughly investigates the critical challenge of selecting an optimal, fixed budget intop-k sparse attentiondue to the dynamic nature of attention weight distributions. -
Introduction of
top-psampling for sparse attention: It proposes usingtop-p samplingto dynamically determine theKV cache budgetat runtime, offering a more intrinsic and adaptive approach compared totop-k. -
Proposal of
Twilightframework: The authors introduceTwilight, a hierarchicalKV cache pruning frameworkwith aSelect-then-Prune architecture. This framework enhances any existing sparse attention algorithm with adaptive budget selection capabilities without sacrificing accuracy, positioning it as an optimizer for existing methods rather than a new sparse attention algorithm itself. -
Efficient kernel implementations: Twilight includes optimized kernel implementations for its
SpGEMV(Sparse General Matrix-Vector Multiplication) operations with4-bit quantizationof theKey cacheand an efficienttop-pmechanism viabinary search, along withload balancingfor head dynamism.The key conclusions and findings are:
-
Twilight can adaptively prune up to 98% of redundant tokens with nearly no accuracy loss across both long- and medium-context benchmarks.
-
It achieves significant efficiency improvements:
- Up to 15.8x speedup for the
self-attentionoperator compared toFlashAttention2. - 1.4x speedup over state-of-the-art sparse attention mechanisms (e.g., Quest).
- 3.9x acceleration in
end-to-end per token latencyin long contextLLM decodingcompared toFlashInfer. - 1.35x speedup in
end-to-end decodingcompared to Quest without Twilight.
- Up to 15.8x speedup for the
-
Twilightoffers a more reasonable and tunable hyperparameter compared to the highly sensitive intop-kmethods. -
The
hierarchical architectureand4-bit quantizationof theK cachecontribute to its efficiency, especially in memory-bound scenarios.These findings address the problem of dynamic budget selection in sparse attention, significantly improving the efficiency of
long-context LLMswhile maintaining accuracy, making their deployment more practical and cost-effective.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand the paper, several fundamental concepts related to Large Language Models (LLMs) and their attention mechanisms are essential:
-
Large Language Models (LLMs):
LLMsare deep learning models, typically based on theTransformer architecture, trained on vast amounts of text data to understand, generate, and process human language. They have billions of parameters and excel at variousNatural Language Processing (NLP)tasks. The paper focuses onlong-context LLMs, which are models designed to process and reason over very long sequences of text (e.g., tens of thousands to millions of tokens), crucial for tasks like summarizing long documents or complex reasoning. -
Attention Mechanism: The
attention mechanismis a core component ofTransformers. It allows the model to weigh the importance of different parts of the input sequence when processing each token. Instead of processing a fixed-size window of text, attention enables the model to "attend" to relevant parts of the entire input, regardless of their position. This is particularly important for long contexts where dependencies can span many tokens.The fundamental
self-attentioncalculation for a single head is often represented as: $ \mathrm{Attention}(Q, K, V) = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V $ Where:- (Query), (Key), (Value) are matrices representing linear projections of the input
embedding(or previous layer's output). Each row corresponds to a token in the sequence. - , , .
- is the sequence length (number of tokens).
- is the dimension of the key and query vectors.
- is the dimension of the value vectors.
- is the transpose of the key matrix.
- The product calculates
attention scores(orlogits), representing how much each token in the sequence should attend to every other token. - is a scaling factor to prevent the
dot productsfrom becoming too large, which can push thesoftmaxfunction into regions with very small gradients. softmaxis an activation function that converts the attention scores intoprobability distributions, ensuring they sum to 1.- The final multiplication with produces a weighted sum of value vectors, where the weights are the attention probabilities.
- (Query), (Key), (Value) are matrices representing linear projections of the input
-
KV Cache: During the
decoding stageofLLMs(when generating text token by token), theKey (K)andValue (V)vectors for previously generated tokens are stored. This stored collection is called theKV cache. When a new token is generated, its query vector is used to attend to all previously computed and vectors in the cache. Storing them avoids recomputing them for every new token, significantly speeding up generation. However, the size of theKV cachegrows linearly with the sequence length, leading to substantial memory consumption andmemory bandwidthbottlenecks forlong-context LLMs. -
Attention Sparsity / KV Cache Sparsity: This refers to techniques that reduce the computational and memory cost of the
attention mechanismby only computing attention over a subset of theKV cache(i.e., a subset of past tokens). The idea is that not all past tokens are equally important for predicting the next token; only a few "critical tokens" (often called "heavy hitters") contribute significantly. By identifying and using only these critical tokens, the overall computation and memory access can be greatly reduced. -
Top-k Sampling (in LLM generation context): In
LLM text generation, after a model predicts a probability distribution over the entire vocabulary for the next token,top-k samplinginvolves selecting the next token only from the most probable tokens. This helps to make the generation process less repetitive than always picking the most probable token, but it uses a fixed regardless of the probability distribution's shape. -
Top-p Sampling / Nucleus Sampling (in LLM generation context): Also used in
LLM text generation,top-p samplingaddresses the limitation oftop-k. Instead of fixing , it selects the smallest set of tokens whose cumulative probability exceeds a threshold . This makes the sampling process adaptive: if the probability distribution is sharp (few tokens have high probability), will select a small ; if it's flat (many tokens have similar, lower probabilities), will select a larger . This approach leads to more diverse and contextually appropriate generations.Twilightborrows this adaptive concept forattention sparsity.
3.2. Previous Works
The paper categorizes previous works into several groups, highlighting the landscape of research on KV cache optimization and sparse attention:
-
Top-k Sparse Attention: These methods aim to select a fixed number of critical tokens.
- KV Cache Compression (token dropping/eviction): These approaches decide which tokens to evict from the
KV cachein a static orquery-agnosticmanner. Examples includeH2O[8],StreamingLLM[17], andSnapKV[18]. They physically removenon-critical tokensfrom memory. - Token Selection (retaining all tokens, selecting for computation): These methods keep all tokens in
GPUmemory but selectively compute attention only with critical tokens. Examples includeSparQ[19],Quest[9],Double Sparsity (DS)[12], andHShare[20]. More advanced algorithms likeRetrievalAttention[21] andPQCache[22] focus on better estimating token criticality. Some, likeNSA[23] andMoBA[24], exploretrainable sparse attention. The common thread among all these is their reliance ontop-kselection, whichTwilightaims to improve.
- KV Cache Compression (token dropping/eviction): These approaches decide which tokens to evict from the
-
Dynamic Budget: Recent studies have shown that the optimal
KV cachebudget varies significantly across different layers [25, 26],attention heads[27, 10, 28], and prompts/tasks [29]. These works often focus on one specific aspect of dynamism.Twilightargues that the root cause of this dynamism is the varyingdistributions of attention weights. -
Non-top-k Sparse Attention:
MagicPIG[30] useslocality-sensitive hash (LSH) samplingto estimateattention weightsinstead of dropping tokens. It requires complexalgorithm-system co-design.SampleAttention[31] also featuresadaptive sparsitybut is primarily focused on theprefill stage.Tactic[32] is a concurrent work that also explorestop-p sparsitybut usesfunction fittingto estimateweight distributions, which can sometimesoverestimatethe budget.
-
Other KV Cache Optimizations: These methods complement
sparsificationby addressing different aspects ofKV cachemanagement.Quantization[33, 34, 35, 36]: Reducing the precision ofKV cacheelements (e.g., fromFP16toINT8orINT4) to save memory andbandwidth.Linear Attention[37, 38]: Approximating thesoftmaxoperation to reduce the quadratic complexity of attention to linear.Memory-efficient Attention[39, 40, 41]: Techniques likeFlashAttentionandSageAttentionoptimize the underlying attention computation kernels to reduce memory access and improve throughput.Twilightis orthogonal to these methods and can be combined with them.
3.3. Technological Evolution
The evolution of LLM efficiency has moved from full attention (where all tokens attend to all others, leading to quadratic complexity in sequence length) to sparse attention (which reduces this by selecting a subset of tokens). Early sparse attention methods often relied on heuristic-based token dropping or fixed-budget top-k selection. While these methods provided significant gains, they introduced a new challenge: how to optimally choose the fixed budget. The observation that attention weight distributions are highly dynamic across different parts of the model and different inputs led to research on dynamic budgeting, but often focusing on specific dimensions like layer-wise or head-wise changes.
This paper's work (Twilight) fits into the technological timeline by addressing the fundamental limitation of fixed-budget top-k sparse attention. It proposes a more general and adaptive solution by borrowing the concept of top-p sampling (successfully used in LLM output generation) and applying it to the internal attention mechanism. This represents a shift from static, fixed-size pruning to dynamic, probability-mass-based pruning, directly tackling the root cause of budget dynamism (varying attention weight distributions).
3.4. Differentiation Analysis
Compared to the main methods in related work, Twilight introduces several core differences and innovations:
-
Adaptive Budgeting via
Top-p: The most significant innovation is the direct application oftop-p samplingto determine theKV cache budget. Unliketop-kmethods (e.g.,Quest,DS,H2O,StreamingLLM), which select a fixed number of tokens,Twilightdynamically adjusts the number of selected tokens to accumulate a certainattention weightprobability mass. This intrinsically adapts tofocusedversusdiffuse attention distributions(as shown in Figure 3), preventing bothover-selectionandunder-selectionthat plaguetop-kmethods. -
Framework for Existing Algorithms: Instead of proposing a completely new sparse attention algorithm,
Twilightis designed as an optimizer framework for existing methods. ItsSelect-then-Prune architectureallows it to be integrated with any existing sparse attention algorithm (e.g.,Quest,DS), enhancing them with adaptive budget selection capabilities without requiring a complete redesign. This makesTwilighthighly versatile and potentially future-proof. -
Improved Accuracy-Efficiency Trade-off: By dynamically pruning redundant tokens based on
attention weightcontribution,Twilightachieves higher efficiency (significant speedups) while maintaining or even improving accuracy compared to the base algorithms operating with fixed budgets.Top-pprovides a theoretical upper bound on error, allowing for more precise control over theaccuracy-efficiency trade-off. -
Precision Requirement Consideration: The paper explicitly considers the
precision requirementsforattention weight estimationfortop-p, finding that4-bit quantizationstrikes a suitable balance, unliketop-kmethods that might push to extremely low1-2 bitsorfull attentionwhich uses8-bitorFP16. -
System-Algorithm Co-design:
Twilightemphasizes efficient kernel implementations, includingSpGEMVwith4-bit quantized Key cacheand abinary search-based top-p kernel, which is crucial for practical deployment onGPUs. This contrasts with somenon-top-kmethods likeMagicPIGthat also requirealgorithm-system co-designbut use different underlying mechanisms.In essence,
Twilightinnovates by providing a general, adaptive, and theoretically grounded solution to the persistent problem of budget selection insparse attention, leveraging a proven concept fromLLM samplingto enhance existing and future sparse attention techniques.
4. Methodology
4.1. Principles
The core idea behind Twilight is to address the limitations of fixed-budget top-k sparse attention by borrowing the adaptive nature of top-p sampling (also known as nucleus sampling), a technique widely used in LLM text generation. The intuition is that the goal of sparse attention is not merely to select a fixed number of tokens, but rather to identify the minimum set of tokens whose cumulative attention weights contribute significantly to the overall attention output, thereby approximating the full attention computation with minimal error.
The theoretical basis stems from the observation that the error introduced by sparse attention can be bounded by the sum of the attention weights of the unselected tokens. Therefore, by ensuring that the selected tokens collectively account for a sufficient proportion () of the total attention weight, the error can be controlled. This approach naturally adapts to the dynamic attention weight distributions observed across different attention heads, layers, and queries (as illustrated in Figure 3). If attention weights are focused on a few tokens, top-p will select a small number of tokens. If they are diffuse, top-p will select more tokens until the cumulative probability threshold is met. This adaptive behavior is crucial for balancing accuracy and efficiency in dynamic LLM inference scenarios.
4.2. Core Methodology In-depth (Layer by Layer)
4.2.1. Problem Formulation of Sparse Attention
The paper begins by formally defining sparse attention during the decoding phase.
Given a query vector and a KV cache , where is the head dimension and is the context length.
Sparse attention calculates an approximate output by selecting a subset of indices .
The formula for sparse attention is:
Where:
-
is the approximate
output vectorof dimensions . -
computes the
attention weightsfor all tokens based on thedot productsimilarity between thequeryandkeys, scaled by , and then normalized bysoftmax. -
represents the full
attention weightsvector (thesoftmaxoutput), where is the attention weight for the -th token. -
is a diagonal matrix of size . Its diagonal elements are 1 for indices and 0 otherwise. This effectively masks out
non-critical tokens. -
is the
value matrixfrom theKV cache.The goal is to minimize the error between the
full attentionoutput and thesparse attentionoutput . Thefull attentionoutput is: The error can be bounded using thesub-multiplicative propertyof theFrobenius norm: Where: -
is the error in the output vector.
-
is an matrix of all ones. The term represents a diagonal matrix where elements for are 0 and elements for are -1. Essentially, it represents the portion of attention weights that are not selected.
-
denotes a vector norm.
-
denotes the
Frobenius normof a matrix. -
Since the distribution of is generally smooth, can be considered a constant.
Therefore, the objective simplifies to minimizing , which is equivalent to maximizing . This means selecting a subset of tokens whose attention weights sum to the largest possible value.
4.2.2. Oracle Top-k Sparse Attention
If the size of the subset is fixed to a budget , this leads to Oracle Top-k Sparse Attention:
Where:
- is the set of selected token indices.
- is the size of the set .
- is the predefined fixed budget (number of tokens to select).
This represents the theoretical upper bound for
top-kmethods.
4.2.3. Oracle Top-p Sparse Attention
The paper argues that the fixed budget in top-k is problematic because attention weight distributions are dynamic (Figure 3). Some distributions are peaked (focused attention), where a few tokens have high weights, while others are flat (diffuse attention), where many tokens have similar, lower weights.
The following figure (Figure 3 from the original paper) shows the diverse distributions observed in attention weights, illustrating "flat" (diffuse) and "peaked" (focused) distributions.
该图像是图表,展示了注意力权重的多样分布。左侧图显示了“平坦分布”(扩散注意力),权重接近均匀分布;中间图呈现“尖峰”分布(集中注意力),权重集中在两侧的标记上;右侧图则展示了这两种分布的叠加,明显体现出差异。
To address this, Twilight introduces Oracle Top-p Sparse Attention, inspired by nucleus sampling. Instead of fixing the number of tokens, it fixes a cumulative probability threshold :
Where:
- is the set of selected token indices.
- is the size of the set .
- is the cumulative
attention weightthreshold (e.g., 0.9). This means selecting the minimum number of tokens whoseattention weightssum up to at least . This approach ensures that the error is bounded by , directly linking the threshold to the theoretical error. Figure 4 demonstrates howtop-pcan select a much smaller budget ( for ) to achieve an error comparable to a large fixed budget ().
The following figure (Figure 4 from the original paper) shows cumulative attention scores of different budget selections in one example attention head.
该图像是图表,展示了不同预算选择下某个注意力头的累积注意力分数。X轴表示令牌位置,Y轴表示累积注意力分数,其中标注的点对应于特定位置的注意力得分,例如(97, 0.80)和(1024, 0.87)。
4.2.4. Twilight Architecture: Hierarchical Pruning with Select-then-Prune
Twilight is designed as a framework to enhance existing sparse attention algorithms. It employs a two-step, hierarchical pruning process, dubbed the Select-then-Prune architecture.
The following figure (Figure 5 from the original paper) illustrates the Twilight architecture.
该图像是示意图,展示了Twilight架构的自注意力计算过程。该过程分为三个步骤:首先,Token Selector使用基础算法在保守预算下选择关键token;接着,Twilight Pruner通过top-阈值修剪选定的token子集;最后,将修剪后的token索引传递给稀疏注意力内核以执行注意力计算。
The self-attention computation proceeds in three steps:
- Token Selector: This component is a black-box representation of an existing base sparse attention algorithm (e.g.,
Quest,DS). It's configured to use aconservative budget(e.g., sparsity), meaning it initially selects a relatively large subset of critical tokens. This step ensures that important tokens are not prematurely discarded and provides a pool of candidates for further refinement. - Twilight Pruner: This is the core innovation. It takes the subset of tokens selected by the
Token Selectorand further prunes them. It appliestop-p thresholding, retaining only the minimum number of tokens whoseattention weightssum exceeds the specified threshold . This step adaptively determines the final budget for each query/head. - Sparse Attention Kernel: The final sparse attention computation is performed only on the indices of the tokens retained by the
Twilight Pruner. This computation is optimized for sparsity and leverages the reduced number of tokens.
4.2.5. Efficient Kernel Implementations
To make Twilight practical and efficient, the authors develop specialized kernel implementations.
4.2.5.1. Efficient SpGEMV with 4-bit Quantization of Key Cache
Estimating attention weights () without loading the full KV cache is critical. This involves computing a Sparse General Matrix-Vector Multiplication (SpGEMV). Since loading is often memory-bound, Twilight reduces memory access cost by quantizing into lower precision.
The paper empirically finds that 4-bit precision for K cache quantization offers the best balance between accuracy and efficiency for top-p, lying between the 1-2 bits used by some top-k methods and 8-bit or FP16 for full attention. 2-bit quantization leads to significant attention weight sum drops, while 4-bit and 8-bit maintain stability (Figure 6).
The following figure (Figure 6 from the original paper) shows the sums of normalized attention weights for selected tokens under different quantization precisions, with .
该图像是图表,展示了在不同量化精度下选定令牌的注意力权重总和。横坐标表示层编号,纵坐标为注意力权重总和,数据点分别对应于2位、4位和8位量化精度,显示了不同层次的影响趋势。
The implementation details from Appendix B.1 include:
- Calculation Process: Adapts
FlashInfer[47]'sattention decoding kernel. It involves asynchronously loading anddequantizingINT4 K cachefromglobal memorytoshared memoryusingcp.async. Atwo-stage pipelineoverlaps data loading with computation.FP16is used fordequantized K cacheto balance accuracy and computation cost. - Dequantization: Uses
per-head,dynamic KV quantization.FP16 scaleandzero-pointare stored for each head.Dequantizationis performed on-the-fly using parameters inspired byQServe[61] and a fast algorithm from [62] (NVIDIA's FasterTransformer[63]) usingcustom PTX assembly instructionsforINT4toFP16conversion. - Bit-packing:
INT4 K elementsare packed into anuint8_t buffer(two4-bit elementsper8-bit byte). An offset of is added to eachINT4element before packing to convert it to anunsigned value. Address calculation is remapped for4-bit granularityby halving theeffective byte offset.
4.2.5.2. Efficient Top-p via Binary Search
A brute-force approach to top-p sampling (sorting all elements and accumulating) is inefficient on GPUs. Twilight implements an efficient top-p kernel by modifying FlashInfer's top-p sampling kernel, utilizing a parallel-friendly binary search method.
The algorithm is presented as Algorithm 1:
Algorithm 1 Top- via Binary Search.
Input: normalized attention weights ,
top- threshold , hyper-parameter .
Output: indices , mask .
- Initialize search range:
- repeat
- $W _ { 0 } = \mathsf { w h e r e } ( W < m , ~ 0 . 0 , ~ W ) ; \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \begin{quad} \end{array} \end{quote}
- is the raw
attention weightsoutput before any selection. BSis the batch size.- is the number of attention heads.
- is the sequence length (context length).
- is the
top-p threshold. epsilonis a small hyper-parameter for numerical stability, defining the convergence criterion for thebinary search.- (left) and (right) define the search range for the minimum weight value that satisfies the
top-pcondition. - is the midpoint of the search range.
- creates a temporary vector where all weights less than are set to 0.0, and others retain their original values.
- creates a temporary vector used to track the lower bound of selected weights.
- creates a temporary vector used to track the upper bound of selected weights.
- The
binary searchiteratively adjusts or based on whether the sum ofattention weightsgreater than or equal to () meets the threshold . - The loop continues until the range between the maximum of and minimum of is smaller than
epsilon, effectively converging to the correct minimum weight value. - Finally, the indices and mask are selected where .
The paper notes that element-wise operations like
max,where, andsumcan be fused into a single loop andtensorizedon theGPU, avoiding materializing intermediate variables.
4.2.5.3. Load Balancing with Awareness of Head Dynamism
The top-p Pruner leads to head-wise dynamic budgets, which can cause load imbalance in the attention kernel (different heads process different numbers of tokens). Twilight reuses the load balancing algorithm from FlashInfer [47] (originally for requests with dynamic lengths) to address this head-wise dynamism by flattening the head dimension.
In Appendix B.2, the paper details how Group Query Attention (GQA) (e.g., in LLaMA 3) is tackled. GQA maps a group of query heads to a single key-value head, creating an incompatibility with query-aware sparse attention that relies on individual query vectors. A brute-force solution (loading tokens independently for each query head) is inefficient due to repeated memory reads. Twilight addresses this by operating at the granularity of query groups: the selected tokens for a query group are the union of tokens identified by all query heads within that group. This shifts head-wise dynamism to group-wise dynamism for GQA, where all heads in a group share a common token budget. This is a deliberate trade-off for implementation efficiency and compatibility.
The following figure (Figure 13 from the original paper) shows the head-wise/group-wise varlen attention with flattened paged KV cache in Twilight and a comparison among different attention methods.
该图像是示意图,左侧展示了使用头级动态性的多头注意力与展平的 KV 缓存的交互,右侧则比较了在真实预算分布下,三种注意力方法在不同批次大小下的时间消耗,其中“Padded”表示将所有头部填充到最大预算长度;“Head Varlen”在头粒度上加载 KV;“Group Varlen”在两者之间取得平衡。
4.2.6. Overhead Analysis and Discussion
4.2.6.1. Execution Time
The total execution time for Twilight is the sum of three parts:
Compared to baseline sparse attention (which only has and ), Twilight adds an extra latency term but reduces due to more aggressive pruning. The hierarchical sparsity, where tokens decrease as precision increases, is key.
If the base algorithm selects tokens and Twilight prunes this to (assuming and where is the budget from Token Selector and is the budget after Twilight Pruner), the theoretical speedup can be approximated.
The formula is given as:
If and , the speedup is approximately . The overheads of the top-p kernel are considered minor because SpGEMV dominates the latency when is around to .
4.2.6.2. Memory Overheads
Twilight introduces an extra INT4 quantized K cache. This adds a additional KV cache memory overhead (since INT4 is half the size of FP16 and only for ). However, this cost can be mitigated:
- Some base algorithms (e.g.,
DS[12]) already useINT4 K cache. - Efforts like
SageAttention2[41] exploreINT4 full-attention, allowing direct reuse of estimated attention weights without needing the originalFP16 K cache. - Techniques like
offloadingorselective quantization(e.g.,INT4 K cacheonly for hot tokens) could further reduce memory if it becomes a bottleneck.
4.2.6.3. Integration with LLM Serving Systems
Twilight's design is compatible with PagedAttention [44], allowing seamless integration into LLM serving systems like vLLM [44] and SGLang [45]. It also works with techniques like prefix sharing and multi-phase attention [48, 45, 49, 50, 51] due to its page-level or token-level sparse operations.
5. Experimental Setup
5.1. Datasets
The experiments evaluate Twilight on a variety of benchmarks covering different context lengths and task types.
-
Long-Context Benchmarks:
- Longbench [1]: A
bilingual, multitask benchmarkforlong context understanding.Twilightevaluates on 12 tasks covering all task types. For efficiency evaluation, specific tasks likeQasper[56] (forQA),GovReport[57] (forsummarization), andLCC[58] (forcoding) are used with prompts ranging from 10k to 30k tokens. - RULER [16]: A benchmark specifically designed to assess the true
context sizeoflong-context language models. It includes specialized tests likeCWE/FWEfor comprehensivenon-retrieval accuracy evaluation.
- Longbench [1]: A
-
Medium-Context Benchmarks (500 to 2k tokens):
-
GSM8K [13]: A dataset of
math word problemsdesigned to testLLMs'reasoning abilities. -
COQA [14]: A
conversational question answering challengedataset. -
PG-19 [15]: A large dataset of books used for
long-range sequence modeling(used here forperplexityevaluation, which measures how well a language model predicts a sample of text).The datasets were chosen to rigorously test
Twilight's performance across diverse tasks andcontext lengths, from typical medium-length interactions to very long documents, validating its effectiveness in various real-world scenarios.
-
5.2. Evaluation Metrics
The paper employs several evaluation metrics to assess both the accuracy and efficiency of Twilight.
-
Accuracy Metrics:
- Score (Longbench, RULER): For
LongbenchandRULER, the paper reports "scores." These are typically task-specific accuracy metrics (e.g.,F1 score,Exact Match,Rouge score, or specialized metrics for coding/reasoning tasks) that quantify the performance of theLLMon each task. A higher score indicates better performance. The paper reports the average score across tasks. - GSM8K (flexible/strict): For
GSM8K,flexibleandstrictaccuracy metrics are used. These measure the correctness of the model's answers to math word problems, withstricttypically requiring an exact match andflexibleallowing for minor variations. Higher values are better. - COQA (em/f1): For
COQA,Exact Match (EM)andF1 scoreare used.- Conceptual Definition (Exact Match - EM):
EMmeasures the percentage of predictions that match the ground truth answer exactly. It is a strict measure of accuracy. - Mathematical Formula (Exact Match - EM): While no single universal formula is typically provided, it's defined as:
$
\mathrm{EM} = \frac{\text{Number of exact matches}}{\text{Total number of questions}} \times 100%
$
Where:
- "Number of exact matches" refers to cases where the model's generated answer string is identical to the reference answer string.
- "Total number of questions" is the total count of questions in the dataset.
- Conceptual Definition (F1 Score): The
F1 scoreis theharmonic meanofprecisionandrecall. It is a more robust metric thanEMforQAtasks, especially when answers can be phrased differently but convey the same meaning. It measures the overlap between the predicted answer and the ground truth answer at the word level. - Mathematical Formula (F1 Score):
$
\mathrm{F1} = 2 \times \frac{\mathrm{Precision} \times \mathrm{Recall}}{\mathrm{Precision} + \mathrm{Recall}}
$
Where:
Precision=Recall=- For a single question,
PrecisionandRecallare calculated based on the word overlap between the predicted answer and the (set of) ground truth answers. TheF1 scorefor the dataset is typically the average F1 over all questions. For bothEMandF1, higher values are better.
- Conceptual Definition (Exact Match - EM):
- PG-19 Perplexity (PPL):
- Conceptual Definition:
Perplexityis a common intrinsic evaluation metric forlanguage models. It quantifies how well a probability model predicts a sample. A lowerperplexityscore indicates that the model is better at predicting the given text, implying higher accuracy and a better fit to the data. It's essentially the exponentiated average negative log-likelihood of a sequence. - Mathematical Formula:
$
\mathrm{PPL}(W) = \exp \left( -\frac{1}{N} \sum_{i=1}^N \log P(w_i | w_1, \dots, w_{i-1}) \right)
$
Or, equivalently, using the inverse product of probabilities:
$
\mathrm{PPL}(W) = \sqrt[N]{\prod_{i=1}^N \frac{1}{P(w_i | w_1, \dots, w_{i-1})}}
$
Where:
- is a sequence of words (or tokens).
- is the total number of words/tokens in the sequence.
- is the probability of the -th word given the preceding words, as predicted by the language model.
- is the natural logarithm of the probability.
- is the exponential function.
A lower
perplexityvalue is better.
- Conceptual Definition:
- Score (Longbench, RULER): For
-
Efficiency Metrics:
- Latency (): Measures the time taken for a specific operation, such as a
self-attentionoperator. Lower latency is better. - Speedup (): A ratio indicating how many times faster
Twilightis compared to a baseline. A higher speedup is better. - Time-Per-Output-Token (TPOT): Measures the average time (e.g., in milliseconds) required to generate a single output token during
end-to-end decoding. This is a crucial metric forLLM serving systems. LowerTPOTis better.
- Latency (): Measures the time taken for a specific operation, such as a
5.3. Baselines
Twilight is primarily presented as a framework to optimize existing sparse attention algorithms. Therefore, its performance is evaluated against various baselines, including full attention and state-of-the-art sparse attention methods, both with and without Twilight integration.
-
Full Attention Baselines:
- PyTorch's Scaled-Dot-Product-Attention (SDPA): Standard
PyTorchimplementation of theattention mechanism. - FlashAttention2 (FA2) [39]: A highly optimized memory-efficient attention implementation, serving as a backend for
SDPA. It's a strong baseline forfull attentionperformance. - MemoryEfficient Attention [59]: Another backend for
SDPAfocused on memory optimization. - FlashInfer [47]: A high-performance kernel library specifically designed for
LLM serving, providing highly optimizedfull attentionkernels.
- PyTorch's Scaled-Dot-Product-Attention (SDPA): Standard
-
State-of-the-Art Top-k Sparse Attention Methods:
- Quest [9]: A
query-aware sparsitymethod for efficientlong-context LLM inference. It's cited as achieving state-of-the-art runtime performance among sparse attention methods. - Double Sparsity (DS) [12]: A method applying
post-training sparse attentionwithdouble sparsity. The authors use optimized configurations tuned for each model, provided by its official repository.
- Quest [9]: A
-
State-of-the-Art Non-Top-k Sparse Attention Method:
- MagicPIG [30]: Uses
locality-sensitive hash (LSH) samplingforattention weightsestimation. It relies on configurable parameters and instead of a fixed budget. The paper uses two standard configurations from the originalMagicPIGpaper. (Note:MagicPIGwas not evaluated forLLaMA-2due to lack of official support).
- MagicPIG [30]: Uses
-
Token Dropping Methods (for comparison of sparsity types):
-
StreamingLLM [17]: An
efficient streaming language modelwithattention sinks, typically for managingKV cachein streaming scenarios. -
SnapKV [18]: A method where
LLMknows what to look for before generation by managing theKV cache.These baselines are representative of current state-of-the-art approaches in
LLM efficiency, covering bothfull attentionand varioussparse attentionstrategies (top-k,non-top-k,token dropping). This comprehensive comparison allowsTwilightto demonstrate its advantages as an adaptive optimizer.
-
5.4. Models
The experiments use three widely adopted LLM models, covering different architectures and context length capabilities:
-
Longchat-7B-v1.5-32k [52]: A 7-billion parameter model with a
32k context length. This model provides a strong baseline forlong-context evaluation. -
LLaMA2-7B-Chat [53]: A 7-billion parameter
LLaMA-2model fine-tuned for chat applications. -
LLaMA-3.1-8B-Instruct [54]: An 8-billion parameter
LLaMA-3.1model, instructed for various tasks, with128k context length. This model utilizesGroup Query Attention (GQA).These models cover two mainstream
attention implementations:Multi-Head Attention (MHA)(e.g.,Longchat,LLaMA2) andGroup Query Attention (GQA)(e.g.,LLaMA-3.1), allowing for a broad evaluation ofTwilight's applicability. For fair comparison,sparse methodsare not applied to the first two layers of the models, as is common practice in many baselines.
5.5. Hyperparameters
- Twilight's threshold: The
hyperparameter pforTwilightis set to0.95forLLaMA-2/3models and0.85forLongchat. This value was determined through anablation study(discussed in Section 5.3) to balance accuracy and efficiency. - Base Algorithm Budget for
Token Selector: ForQuestandDS, the budget () in theToken Selector(the first step ofTwilight) is varied from256to8192. When integrated withTwilight, aconservative budget(e.g., sparsity) is chosen for theToken Selectoras an initial, larger subset of tokens before theTwilight Prunerrefines it. - MagicPIG Parameters:
MagicPIGuses parameters and . The paper adopts two standard configurations: and . - Medium-Context Baseline Budget: For
medium-context tasks, baselineQuestandDSmodels use a fixed budget of128. This is chosen to be comparable to the average budget afterTwilight's pruningin those scenarios.
6. Results & Analysis
6.1. Core Results Analysis
6.1.1. Accuracy Evaluation
6.1.1.1. Results on Longbench
The Longbench evaluation (Table 2 and the detailed Table 5 in Appendix C) demonstrates Twilight's effectiveness in long-context scenarios. For Longchat-7B-v1.5-32k, Twilight improves the average score over its original version by up to (when optimizing DS) while pruning up to of tokens over-selected by the base algorithm. This shows significant efficiency gains with accuracy improvement. For LLaMA-3.1-8B-Instruct, Twilight achieves nearly zero accuracy loss () with a slight increase in budget usage, suggesting that knowledge in LLaMA-3.1 might be more compressed. This indicates Twilight can adapt to different model characteristics.
The following are the results from Table 2 of the original paper:
| Budget | Longchat-7B -v1.5-32k | LLaMA-3.1-8B -Instruct | |||
| Score | Pruned % (Avg. Budget) | Score | Pruned % (Avg. Budget) | ||
| Full | 32k | 36.78 | - | 52.01 | - |
| Twilight | 38.52 (+4.7%) | 99.6% (141.4) | 51.64 (-0.7%) | 99.6% (478) | |
| MagicPIG | K=8,L=75 | - | - | 51.70 | - |
| K=10, L=150 | - | - | 51.32 | - | |
| Quest | 256 | 31.26 | - | 38.20 | - |
| 1024 | 36.85 | - | 47.79 | - | |
| 4096 | 37.33 | - | 50.79 | - | |
| 8192 | 37.10 | - | 51.44 | - | |
| Twilight | 38.04 (+2.5%) | 98.4% (131) | 51.57 (+0.3%) | 99.5% (427) | |
| DS | 256 | 35.32 | - | 45.74 | - |
| 1024 | 35.96 | - | 49.43 | - | |
| 4096 | 36.31 | - | 50.98 | - | |
| 8192 | 36.62 | - | 51.14 | - | |
| Twilight | 38.71 (+5.7%) | 98.5% (126) | 51.73 (+1.2%) | 99.5% (446) | |
6.1.1.2. Results on RULER
On the RULER benchmark (Table 3), Twilight significantly boosts the performance of base sparse attention methods. Quest-Twi achieves scores comparable to the SOTA non-top-k method MagicPIG. More remarkably, DS-Twi sets new performance records, surpassing all existing methods. This indicates Twilight's ability to not only recover lost accuracy from fixed-budget methods but also push SOTA performance further.
The following are the results from Table 3 of the original paper:
| Budget | 16k | 32k | 64k | 96k | Avg. | |
| Full | 100% Twilight | 92.88 | 89.42 | 93.13 | 89.10 | 85.17 |
| 84.64 | 83.10 | 87.49 | 85.23 | 88.18 | ||
| MagicPIG | K=8, L=75 | 92.22 | 89.37 | 84.07 | 82.58 | 87.06 |
| K=10, L=150 | 91.38 | 88.20 | 83.34 | 82.02 | 86.23 | |
| Quest | 4% | 79.35 | 79.8 | 78.64 | 73.22 | 77.75 |
| 8% | 87.31 | 83.06 | 80.82 | 75.28 | 81.62 | |
| Twilight | 91.53 | 87.97 | 84.12 | 82.96 | 86.65 | |
| DS | 4% | 92.04 | 88.11 | 84.43 | 82.56 | 86.79 |
| 8% | 92.89 | 88.70 | 84.39 | 82.72 | 87.18 | |
| Twilight | 93.54 | 89.24 | 85.91 | 82.81 | 87.88 |
6.1.1.3. Results on Medium-Context Tasks
For medium-context tasks (GSM8K, COQA, PG-19 Perplexity in Table 4), Twilight demonstrates that its Pruner does not negatively impact performance. Twilight significantly outperforms Quest and DS (when they use a fixed budget of 128) and shows nearly zero accuracy loss compared to full attention. The average budgets after Twilight's pruning are shown to be quite small (e.g., 90.82 for GSM8K with LLaMA-2-7B-Chat), confirming its aggressive but accurate pruning capabilities.
The following are the results from Table 4 of the original paper:
| GSM8K(flexible/strict)↑ COQA(em/f1)↑ PG-19 Perplexity↓ | |||
| LLaMA-2-7B-Chat | |||
| Full | 0.2290/0.2282 | 0.5935/0.7511 | 7.503 |
| Quest | 0.0523/0.0508 | 0.5710/0.7425 | 14.15 |
| DS | 0.2191/0.2190 | 0.5855/0.7401 | 7.622 |
| Twilight | 0.2153/0.2115 | 0.6088/0.7642 | 7.600 |
| (Twilight Avg. Budget) | 90.82 | 91.86 | 102.58 |
| LLaMA-3.1-8B-Instruct | |||
| Full | 0.7726/0.7475 | 0.6363/0.7882 | 7.490 |
| Quest | 0.3639/0.3533 | 0.6007/0.7554 | 19.00 |
| DS | 0.6194/0.6027 | 0.6455/0.7964 | 7.967 |
| Twilight | 0.7771/0.7604 | 0.6325/0.7869 | 7.529 |
| (Twilight Avg. Budget) | 112.40 | 86.85 | 110.98 |
6.1.2. Efficiency Evaluation
6.1.2.1. Self-Attention Speedup
Figure 7 shows the self-attention operator speedups. FlashInfer-Twi and Quest-Twi achieve impressive speedups, up to and respectively, compared to FlashAttention2. Furthermore, they accelerate their respective base algorithms (FlashInfer and Quest) by and . This highlights Twilight's ability to significantly improve the core attention operation itself by reducing the number of tokens processed.
The following figure (Figure 7 from the original paper) shows latencies and speedups of self-attention at different sequence lengths and batch sizes.
该图像是一个图表,展示了在不同批大小和序列长度下自注意力的延迟时间。图中列出了四组不同批大小(8、16、32、64)对应的延迟(μs)以及各种方法的速度提升比率,包括FA2、FlashInfer、Quest、FlashInfer-Twi和Quest-Twi。
6.1.2.2. End-to-End Decoding Speedup
In end-to-end decoding scenarios (Figure 8), Quest-Twi achieves up to a speedup compared to FlashInfer and a speedup compared to Quest without Twilight. This demonstrates the practical benefits of Twilight in LLM serving systems by reducing the Time-Per-Output-Token (TPOT).
The following figure (Figure 8 from the original paper) shows Time-Per-Output-Token (TPOT) improvements in end-to-end serving scenarios.
该图像是一个柱状图,展示了不同批量大小(32、64、128、256)下,各种序列长度(10k、20k、30k)对应的处理时间(ms)。图中的数据表明,通过使用不同的方法(FlashInfer、Quest、Quest-Twi),在多种场景下,处理时间表现出显著差异,显示出改进程度,部分方法在处理特定序列长度时实现了加速效果。
6.1.2.3. Accuracy Comparison with Token Dropping Methods
Table 6 compares DS-Twilight against token-dropping methods like StreamingLLM and SnapKV on Longbench. DS-Twilight consistently achieves notably better performance across all tasks, with an average score of 38.71 compared to 32.69 for StreamingLLM and 36.22 for SnapKV. This reinforces the argument that token-selecting (which Twilight enhances) generally outperforms token-dropping because dropping tokens leads to irreversible information loss.
The following are the results from Table 6 of the original paper:
| Dataset | StreamingLLM (Budget=4096) | SnapKV (Budget=4096) | DS-Twilight |
| Qasper | 26.39 | 29.44 | 32.34 |
| MulQA-en | 33.2 | 40.03 | 43.89 |
| HotpotQA | 24.29 | 33.67 | 34.67 |
| 2WikiMQA | 20.1 | 24.13 | 25.43 |
| Musique | 10.87 | 13.45 | 13.84 |
| GovReport | 26.92 | 26.09 | 31.88 |
| QMSum | 20.8 | 22.53 | 23.01 |
| MultiNews | 26.46 | 25.61 | 26.32 |
| TrivialQA | 75.6 | 80.82 | 85.29 |
| PR-en | 24.17 | 30.25 | 35.50 |
| LCC | 52.47 | 52.62 | 55.03 |
| Repobench-P | 51.02 | 55.99 | 57.27 |
| Avg. | 32.69 | 36.22 | 38.71 |
6.1.2.4. Efficiency Evaluation in Offloading Scenarios
Table 7 demonstrates that Twilight yields even more significant gains in memory-offloading scenarios where per-token loading cost is dominant. Quest-Twi achieves up to speedups compared to Quest. This is because Twilight effectively reduces the number of tokens that need to be loaded, directly addressing the bottleneck in offloading.
The following are the results from Table 7 of the original paper:
| 10k | 20k | 30k | |
| Quest | 3038.98 | 5990.75 | 8490.95 |
| Quest-Twi | 415.89 | 480.61 | 527.77 |
6.2. Ablation Studies / Parameter Analysis
6.2.1. Sensitivity to Threshold
The paper investigates the sensitivity of Twilight to the hyperparameter p. While is a new parameter, the authors argue it is more reasonable and tunable than because it represents accumulated probability, making it less affected by varying attention weight distributions across heads, layers, or queries. Figure 9 illustrates the trade-off between PG-19 perplexity (accuracy) and sparse attention latency (efficiency) at different values. The results show that a balance between accuracy and efficiency is struck at . This informed the choice of for Longchat-7B-v1.5-32k and for LLaMA-2/3 models.
The following figure (Figure 9 from the original paper) shows PG-19 perplexity (accuracy) and sparse attention latency (efficiency) under different threshold values.
该图像是图表,展示了在不同阈值下,PG-19的困惑度(越低越好)和稀疏注意力延迟(以微秒为单位)的关系。数据呈现了困惑度随值的变化趋势,以及相应的延迟时间。
6.2.2. Time Breakdown for Twilight
Figure 10 provides an insightful time breakdown of the self-attention operation for Quest and Quest-Twi at different batch sizes. For a 32k retrieval task, Quest uses a budget of 8192 ( sparsity), while Quest-Twi prunes this down to an average of 256 tokens. The breakdown confirms the theoretical cost model from Section 4.3: Twilight significantly reduces the time for the sparse attention kernel (the final computation step) while introducing minor overheads for the Token Selector and Twilight Pruner components. For instance, at batch size 64, Quest-Twi outperforms Quest by approximately , demonstrating that the cost of pruning is well-justified by the reduced computation in the final attention step.
The following figure (Figure 10 from the original paper) shows the time breakdown of self-attention. At batch size 64, Quest-Twi outperforms Quest by about .
该图像是图表,展示了不同批量大小下(16、32、64),Quest-Twi 与 Quest 的自注意力时间分解。可以看到,在批量大小为 64 的情况下,Quest-Twi 的性能约提高了 2 imes。
6.3. Full Results on Longbench
The detailed results on Longbench are provided in Table 5 in Appendix C of the original paper. These results corroborate the summary provided in Section 6.1.1.1, showing Twilight's ability to improve or maintain accuracy while significantly reducing the active budget across a diverse set of long-context tasks for both Longchat and LLaMA-3.1 models.
The following are the results from Table 5 of the original paper:
| Method | Budget | Single-Doc. QA | Multi-Doc. QA | Summarization | Few-shot Synthetic LCC Repobench-P | Code | Avg. Score | |||||||
| Qasper MF-en HotpotQA 2WikiMQA Musique GovReport QMSum MultiNews | ||||||||||||||
| Full | 32k | 29.48 | 42.11 | 30.97 | 23.74 | 13.11 | 31.03 | 22.77 | 26.09 | 83.25 | 30.50 | 52.70 | 55.62 | 36.78 |
| Twilight (Avg 141.4) | 31.95 | 43.91 | 33.59 | 25.65 | 13.93 | 32.19 | 23.15 | 26.30 | 85.14 | 34.50 | 54.98 | 57.12 | 38.52 (+4.7%) | |
| Quest | 256 | 26.00 | 32.83 | 23.23 | 22.14 | 7.45 | 22.64 | 20.98 | 25.05 | 67.40 | 33.60 | 48.70 | 45.07 | 31.26 |
| 1024 | 31.63 | 42.36 | 30.47 | 24.42 | 10.11 | 29.94 | 22.70 | 26.39 | 84.21 | 34.5 | 51.52 | 53.95 | 36.85 | |
| 4096 | 29.77 | 42.71 | 32.94 | 23.94 | 13.24 | 31.54 | 22.86 | 26.45 | 84.37 | 31.50 | 53.17 | 55.52 | 37.33 | |
| 8192 | 29.34 | 41.70 | 33.27 | 23.46 | 13.51 | 31.18 | 23.02 | 26.48 | 84.70 | 30.00 | 53.02 | 55.57 | 37.10 | |
| Twilight (Avg. 131) | 31.95 | 43.28 | 31.62 | 24.87 | 13.48 | 32.21 | 22.79 | 26.33 | 84.93 | 33.50 | 54.86 | 56.70 | 38.04 (+2.5%) | |
| DS | 256 | 28.28 | 39.78 | 27.10 | 20.75 | 9.34 | 29.68 | 21.79 | 25.69 | 83.97 | 32.00 | 52.01 | 53.44 | 35.32 |
| 1024 | 30.55 | 41.27 | 30.85 | 21.87 | 7.27 | 26.82 | 22.95 | 26.51 | 83.22 | 31.50 | 53.23 | 55.50 | 35.96 | |
| 4096 | 28.95 | 41.90 | 32.52 | 23.65 | 8.07 | 29.68 | 22.75 | 26.55 | 83.34 | 30.00 | 52.7 | 55.48 | 36.31 | |
| 8192 | 29.05 | 41.42 | 31.79 | 22.95 | 12.50 | 30.44 | 22.50 | 26.43 | 83.63 | 30.50 | 52.87 | 55.33 | 36.62 | |
| Twilight (Avg. 126) | 32.34 | 43.89 | 34.67 | 25.43 | 13.84 | 31.88 | 23.01 | 26.32 | 85.29 | 35.50 | 55.03 | 57.27 | 38.71 (+5.7%) | |
| Full | 128k | 46.17 | 53.33 | 55.36 | 43.95 | 27.08 | 35.01 | 25.24 | 27.37 | 91.18 | 99.50 | 62.17 | 57.76 | 52.01 |
| Twilight (Avg. 478) | 43.08 | 52.99 | 52.22 | 44.83 | 25.79 | 34.21 | 25.47 | 26.98 | 91.85 | 100.00 | 64.06 | 58.22 | 51.64 (-0.7%) | |
| MagicPIG | K=8,L=75 | 45.03 | 54.24 | 56.46 | 47.34 | 26.58 | 33.63 | 24.98 | 26.70 | 92.13 | 100.00 | 61.94 | 51.40 | 51.70 |
| K=10, L=150 | 44.68 | 53.63 | 56.19 | 47.18 | 26.79 | 33.31 | 25.13 | 26.22 | 91.89 | 99.50 | 60.07 | 51.15 | 51.32 | |
| Quest | 256 | 24.65 | 37.50 | 30.12 | 23.60 | 12.93 | 27.53 | 20.11 | 26.59 | 65.34 | 95.00 | 49.70 | 45.27 | 38.20 |
| 1024 | 38.47 | 49.32 | 47.43 | 38.48 | 20.59 | 33.71 | 23.67 | 26.60 | 81.94 | 99.50 | 60.78 | 52.96 | 47.79 | |
| 4096 | 43.97 | 53.64 | 51.94 | 42.54 | 24.00 | 34.34 | 24.36 | 26.75 | 90.6 | 99.50 | 62.03 | 55.49 | 50.79 | |
| 8192 | 44.34 | 53.25 | 54.72 | 44.84 | 25.98 | 34.62 | 24.98 | 26.70 | 91.61 | 100.00 | 62.02 | 54.20 | 51.44 | |
| Twilight (Avg. 427) | 43.44 | 53.2 | 53.77 | 43.56 | 25.42 | 34.39 | 25.23 | 26.99 | 91.25 | 100.0 | 63.55 | 58.06 | 51.57 (+0.3%) | |
| DS | 256 | 38.24 | 49.58 | 43.38 | 31.98 | 15.52 | 33.40 | 24.06 | 26.86 | 84.41 | 99.50 | 53.28 | 48.64 | 45.74 |
| 1024 | 42.97 | 54.65 | 51.5 | 33.92 | 20.39 | 34.50 | 24.92 | 26.71 | 92.81 | 99.50 | 62.66 | 48.37 | 49.43 | |
| 4096 | 43.50 | 53.17 | 54.21 | 44.70 | 23.14 | 34.73 | 25.40 | 26.71 | 92.78 | 99.50 | 62.59 | 51.31 | 50.98 | |
| 8192 | 43.82 | 53.71 | 54.19 | 45.13 | 23.72 | 34.27 | 24.98 | 26.69 | 91.61 | 100.00 | 62.40 | 52.87 | 51.14 | |
| Twilight (Avg. 446) | 43.08 | 52.89 | 54.68 | 44.86 | 24.88 | 34.09 | 25.20 | 27.00 | 91.20 | 100.00 | 62.40 | 52.87 | 51.73 (+1.2%) | |
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper effectively highlights a fundamental flaw in existing top-k sparse attention methods: their inability to dynamically adapt KV cache budgets to the varying attention weight distributions encountered in LLM inference. To address this, the authors introduce Twilight, a novel framework that leverages top-p sampling from LLM generation to bring adaptive sparsity to any existing sparse attention algorithm.
Twilight employs a hierarchical Select-then-Prune architecture. It first utilizes a base sparse attention algorithm to conservatively select a large token subset, which is then refined by the Twilight Pruner using top-p thresholding. This adaptive approach allows Twilight to prune up to 98% of redundant tokens while maintaining or even improving accuracy across various long-context and medium-context benchmarks.
Empirically, Twilight demonstrates substantial efficiency gains: up to a speedup for the self-attention operator and a reduction in end-to-end per-token latency in long context LLM decoding. When compared to the base sparse attention algorithms it enhances, Twilight provides an additional speedup. The framework also features efficient kernel implementations, including 4-bit quantized K cache and a binary search-based top-p kernel.
7.2. Limitations & Future Work
The authors acknowledge several limitations and suggest future research directions:
- Estimation Overheads: While
Twilightsignificantly reduces thesparse attention kernelcomputation, theestimation overheads(forToken SelectorandTwilight Pruner) are still non-negligible. This makesTwilightparticularly beneficial in scenarios whereloading tokens from the KV cacheis the dominant cost, such asLLM servingwithlarge batch sizesoroffloadingto slower memory. Future work could focus on optimizing these estimation methods to further improveend-to-end latencyandthroughput. Group Query Attention (GQA)Compatibility: The paper notes thathead-wise dynamism(a key feature enabled bytop-p) is not fully compatible withGQAarchitectures. This leads to a compromise where dynamism transitions togroup-wise granularity, meaning all heads within aGQA groupshare a common token budget. Future research should explore how to more effectively integrateTwilightwith emerging model architectures likeGQAandMulti-Head Latent Attention (MLA)[64] to leveragehead-wise dynamismwithout compromising efficiency.
7.3. Personal Insights & Critique
This paper presents a highly intuitive and impactful idea: transplanting the well-understood concept of top-p sampling from LLM output generation to the often more opaque realm of attention sparsity. The analogy is elegant and powerful, addressing a critical bottleneck in LLM efficiency in a principled way.
One of the strongest aspects of Twilight is its design as a framework rather than a standalone algorithm. This Select-then-Prune architecture makes it broadly applicable, allowing existing and future sparse attention methods to gain adaptive capabilities without extensive re-engineering. This modularity is a significant advantage in the rapidly evolving LLM landscape. The empirical results are compelling, demonstrating substantial speedups with minimal to no accuracy loss, which is the holy grail for LLM optimization.
The explicit consideration of 4-bit quantization for the K cache and efficient GPU kernel implementations (like the binary search for top-p) highlights a strong system-algorithm co-design approach. This pragmatic focus on deployment challenges ensures that the theoretical gains translate into real-world performance improvements. The detailed ablation studies on the threshold and time breakdown further strengthen the paper's rigor, providing transparency into Twilight's costs and benefits.
A potential area for deeper exploration, as implicitly raised by the authors' GQA limitation, is the interaction between top-p pruning and the architectural nuances of different LLMs. While top-p is adaptive to attention weight distributions, the fundamental "importance" of tokens might not always perfectly align with their attention scores, especially in highly specialized attention heads (e.g., retrieval heads). Future work could investigate if learned mechanisms could dynamically adjust the threshold or if top-p could be combined with other importance estimation signals for even more nuanced adaptive sparsity.
Overall, Twilight offers a promising direction for adaptive attention sparsity, emphasizing the importance of dynamic resource allocation in LLM inference. Its methods and conclusions could potentially be transferred to other domains requiring adaptive pruning or resource management in deep learning, especially where input feature importance varies dynamically.
Similar papers
Recommended via semantic vector search.