Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems
TL;DR Summary
Sieve uses attention and random cut forests to sample structurally and temporally uncommon traces, improving information retention and reducing storage costs in large-scale microservice systems.
Abstract
Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems Zicheng Huang, Pengfei Chen*, Guangba Yu, Hongyang Chen, Zibin Zheng School of Data and Computer Science Sun Yat-sen University Guangzhou 510006, China Email: { huangzch8, yugb5, chenhy95 } @mail2.sysu.edu.cn, { *chenpf7, zhzibin } @mail.sysu.edu.cn Abstract —End-to-end tracing plays an important role in un- derstanding and monitoring distributed microservice systems. The trace data are valuable to help find out the anomalous or erroneous behavior of the system. However, the volume of trace data is huge leading to a heavy burden on analyzing and storing them. To reduce the volume of trace data, the sampling technique is widely adopted. However, existing uniform sampling approaches are unable to capture uncommon traces that are more interesting and informative. To tackle this problem, we design and implement Sieve , an online sampler that aims to bias sampling towards uncommon traces by taking advantage of the attention mechanism. The evaluation results on the trace datasets collected from real-world and experimental microservice systems show that Sieve is effective to increas
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Sieve: Attention-based Sampling of End-to-End Trace Data in Distributed Microservice Systems
1.2. Authors
Zicheng Huang, Pengfei Chen*, Guangba Yu, Hongyang Chen, Zibin Zheng. Their affiliation is the School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China. Pengfei Chen is indicated as the corresponding author. The authors are primarily affiliated with a university, suggesting an academic research background focused on computer science, distributed systems, and potentially AI/machine learning given the use of attention mechanisms and anomaly detection.
1.3. Journal/Conference
The paper does not explicitly state the journal or conference where it was published within the provided text. However, the presence of an Index Terms section and a REFERENCES section formatted with IEEE style suggests it is intended for publication in a peer-reviewed conference or journal in the field of computer science, likely related to distributed systems, software engineering, or machine learning for systems.
1.4. Publication Year
The publication year is not explicitly stated in the provided text. However, the references section includes citations up to 2020, with several papers from 2018, 2019, and 2020 (e.g., [7], [16]). This suggests the paper was likely published in 2020 or later.
1.5. Abstract
End-to-end tracing is crucial for understanding and monitoring distributed microservice systems, but the vast volume of trace data poses significant challenges for analysis and storage. Current uniform sampling techniques often fail to capture rare but informative traces. To address this, the paper introduces Sieve, an online sampling mechanism that utilizes an attention mechanism to prioritize sampling of traces that are structurally and temporally uncommon. Sieve encodes traces as path vectors to capture these differences, employs Robust Random Cut Forest (RRCF) for uncommon pattern detection, and calculates attention scores to determine sampling probabilities. Experimental evaluations on four diverse microservice trace datasets demonstrate that Sieve effectively increases the selection of valuable uncommon traces while substantially reducing storage costs at low sampling rates. Furthermore, Sieve exhibits robustness across varying degrees of uncommonness and parameter configurations, making it suitable for large-scale microservice environments.
1.6. Original Source Link
/files/papers/6901a5c984ecf5fffe471754/paper.pdf The publication status is unknown from the provided text, but the file path suggests it is likely an officially published paper or an accepted preprint from a reputable academic platform.
2. Executive Summary
2.1. Background & Motivation
The core problem addressed by the paper is the immense volume of end-to-end trace data generated in distributed microservice systems, which creates a heavy burden for storage and analysis. While traces are vital for understanding system behavior, diagnosing issues, and detecting anomalies, most of this data is redundant, representing common, normal system operations. Sampling is a widely adopted technique to reduce data volume.
Existing sampling methods face significant challenges:
-
Head-based sampling (e.g., in Zipkin, Jaeger) makes decisions at the start of a request, lacking awareness of future events, leading to random sampling that often misses
informative traces. -
Tail-based sampling addresses some
head-basedlimitations by deciding at the end of a request, considering metrics likelatencyorHTTP status codes. However, it still focuses on individual traces and overlooks the broader context of differences between the current trace and previous traces. -
Previous methods focusing on differences (e.g., [3], [4]) primarily consider only
structural differences, neglectingtemporal differences, which are equally crucial for identifying unusual behavior. For example, a trace with a normal structure but abnormally high latency indicates an issue.The paper's entry point and innovative idea revolve around the concept of "attention." It proposes that
informative tracesare those that areuncommon– differing either structurally or temporally (or both) from the majority of other traces. By explicitly incorporating bothstructuralandtemporal attention,Sieveaims to bias sampling towards these rare, valuable traces, which are often indicative of anomalies or interesting corner cases.
2.2. Main Contributions / Findings
The paper makes four primary contributions:
-
Introduction of the Attention Mechanism for Uncommon Trace Capture:
Sieveintroduces anattention mechanismthat focuses on bothtemporalandstructural differencesto identify and captureuncommon traces. This is a significant improvement over prior work that typically focused on only one aspect. -
Path Vector Encoding with Temporal and Structural Information: The paper proposes a novel
path vector encoding methodto represent traces. This method effectively incorporates bothtemporal(e.g., latency of tail spans) andstructural(e.g., sequence of services, number of spans) information into a single vector, making traces processable by algorithms and allowing for comprehensive comparison. -
Biased Sampling Scheme based on Attention Score:
Sievedevelops abiased sampling schemethat leveragesattention scores. This scheme assigns high sampling probabilities touncommon traces(those with high attention scores) and significantly lower probabilities tocommon traces, leading to substantial storage space reduction while preserving critical data. -
Online Sampling Implementation and Evaluation: The paper details the design and implementation of
Sieveas anonline samplerwith low computational cost. Its effectiveness is rigorously evaluated across several real-world and experimental microservice systems, demonstrating its suitability for large-scale production environments.Key conclusions and findings reached by the paper include:
Sieveeffectively detects and increases sampling probabilities for bothstructurallyandtemporally uncommon traces.- It substantially reduces
storage costsat lowsampling ratesby discarding a large volume of redundant common traces. - The method demonstrates
robustnessacross varying degrees ofuncommonnessand different parameter settings (e.g.,RRCT size,number of RRCTs,threshold). Sieve'ssampling latencyis low (typically between 3ms and 33ms), making it efficient enough forreal-time online samplingin large-scale microservice systems.- Compared to baseline methods like
hierarchical clustering (HC)andrandom sampling,Sieveconsistently achieves a higherproportion of uncommon traces sampledfor a given budget.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand this paper, a beginner should be familiar with the following foundational concepts:
- Microservice Architecture: A software development approach where a large application is built as a suite of small, independently deployable services, each running in its own process and communicating with lightweight mechanisms (e.g., HTTP APIs). This contrasts with monolithic architectures where all components are tightly coupled.
- Distributed Systems: Systems where components located on different networked computers communicate and coordinate their actions by passing messages. Microservice systems are a type of distributed system.
- End-to-End Tracing (Distributed Tracing): A technique used to monitor and profile requests as they propagate through a distributed system. It records the path of a request across multiple services, capturing details like
latency,service calls, anderrors. Atracetypically consists of multiplespans.- Trace: Represents a single end-to-end request or workflow. It's often visualized as a Directed Acyclic Graph (DAG) of
spans. - Span: Represents a logical unit of work within a trace, such as a single service call or an operation within a service. Each span has a name, start time, duration (latency), and potentially tags or logs. Spans are nested to show parent-child relationships, forming the causal chain of a request.
- Root Span: The first span in a trace, representing the entry point of the request into the system.
- Trace: Represents a single end-to-end request or workflow. It's often visualized as a Directed Acyclic Graph (DAG) of
- Sampling: The process of selecting a subset of data from a larger dataset. In distributed tracing, sampling is used to reduce the volume of traces collected, stored, and analyzed, as collecting all traces is often prohibitively expensive.
- Uniform Sampling: A method where each trace (or data point) has an equal probability of being selected.
- Head-based Sampling: A sampling decision is made at the very beginning of a trace's execution (when the
root spanis created). This decision is then propagated to all downstream services, meaning either the entire trace is collected or entirely dropped. Its disadvantage is that the decision is made without knowing the trace's future characteristics (e.g., if it will become anomalous). - Tail-based Sampling: The sampling decision is deferred until the entire trace has completed. This allows the sampler to inspect the full trace (e.g., its total
latency, presence of errors, or specific service calls) before deciding whether to keep it. This method is generally more effective at capturinginformative tracesbut requires buffering traces until completion.
- Attention Mechanism (General Concept): In machine learning, an
attention mechanismallows a model to focus on the most relevant parts of its input when processing data. Instead of treating all parts of the input equally, it assigns "attention weights" or "scores" to different parts, highlighting their relative importance. InSieve, this means giving higher scores (and thus higher sampling probability) to traces that are deemeduncommonoranomalous. - Outlier Detection (Anomaly Detection): The process of identifying data points that deviate significantly from the majority of the data. These
outliersare oftenanomaliesthat could indicate problems or interesting events. - Isolation Forest (iForest): An
anomaly detection algorithmthat works by isolating outliers rather than profiling normal data points. It builds multipleisolation trees(similar to decision trees) by recursively partitioning data. Anomalies, being few and different, are typically isolated with fewer partitions (i.e., closer to the root of the tree), resulting in shorter average path lengths. - Robust Random Cut Forest (RRCF): A variant of
Isolation Forestspecifically designed for streaming data. It builds multiplerandom cut trees(RCTs) and maintains them over time, adapting to changes in data distribution.RRCFis robust to noisy data and works well for high-dimensional streams.SieveleveragesRRCFto detectuncommon tracesby treating them asoutliersin a stream ofpath vectors. - Path Vector: A numerical representation of a
trace.Sieveencodes traces intopath vectorsby capturingstructural(which paths exist) andtemporal(latency values along those paths) information. This vector serves as the input to theRRCFforanomaly detection.
3.2. Previous Works
The paper discusses several related works, mainly focusing on distributed tracing systems and sampling techniques:
-
Distributed Tracing Systems (Dapper, Zipkin, Jaeger, X-Trace, Magpie):
- Dapper [8]: Google's production distributed systems tracing platform. It's a foundational work that established many concepts for
distributed tracing. It usesrandom sampling(e.g., < 0.1% sampling rate) to reduce storage costs. - Zipkin and Jaeger: Open-source
distributed tracing systemsthat follow Dapper's model. They primarily usehead-based sampling, making sampling decisions at the start of a request. - Magpie [17] and X-Trace [19]: Earlier tracing frameworks that aim to capture control flow, resource consumption, and provide comprehensive views of system behavior.
- Limitation: While these systems are crucial for observability, their default
uniformorhead-based samplingmechanisms are limited in their ability to captureuncommonorinformative traces.
- Dapper [8]: Google's production distributed systems tracing platform. It's a foundational work that established many concepts for
-
Performance Diagnosis and Anomaly Detection (Draco, Seer, Sipta, MEPFL, AutoMap):
- Works like Draco [9], Seer [10], Sage [11], Sipta [12], MEPFL [15], AutoMap [23], and others [13], [14], [20]-[22], [24] focus on using trace data to detect performance issues, anomalies, or pinpoint root causes.
- MEPFL [15]: Trains prediction models using trace-level and microservice-level features to predict latent errors and locate faults.
Sievecould potentially incorporate some of these features to improve itsuncommonness detection. - Limitation: These works focus on analysis after traces are collected, rather than optimizing the collection process through intelligent sampling.
-
Biased Sampling Techniques:
- JCallGraph [25]: Samples only
successful tracesand preserves allfailure invocation traces. While reducing network bandwidth, it losestraces with high latencythat might not be failures but still indicate problems. - Martin Bauer et al. [26]: Introduces
trace samplingintoconformance checking, continuously sampling until no new information for conformance is found. This is a domain-specific approach. - Hierarchical Clustering (HC) Sampling [3]: A method that biases sampling to maximize the diversity of traces based on the
number of spans(a structural characteristic).- Limitation:
HCsampling primarily focuses onstructural diversityand is not designed foronline sampling. It also fails to account fortemporal uncommonness(e.g., a trace with a normal structure but unusually high latency).
- Limitation:
- Sifter [4]: Builds a
low-dimension modelto approximatecommon-case behaviorand biases sampling towards traces poorly captured by this model.- Limitation: Similar to
HC,Siftermainly focuses on thestructure of tracesand overlookstemporal characteristics. This means traces withuncommon temporal behaviorwould be ignored.
- Limitation: Similar to
- Error-bounded stratified sampling [27]: From database literature, this scheme reduces sample size with knowledge of data distribution. It's a different domain but shows the general need for intelligent sampling.
- JCallGraph [25]: Samples only
3.3. Technological Evolution
The evolution of distributed tracing and sampling has progressed from basic random sampling (e.g., Dapper) to more sophisticated methods that attempt to capture informative traces. Initially, the focus was on simply reducing data volume. As systems grew more complex, the need to retain anomalous or uncommon traces became critical for debugging and monitoring. Early biased sampling methods started by considering explicit error codes or high latency (tail-based) or structural diversity (HC, Sifter). However, a holistic view combining both structural and temporal deviations, especially in an online and adaptive manner, remained a challenge.
This paper's work, Sieve, fits into this evolutionary timeline by addressing the limitations of previous biased sampling approaches. It advances the state-of-the-art by proposing a method that explicitly considers both temporal and structural attention for uncommon trace detection and applies it in an online sampling context using an adapted RRCF algorithm.
3.4. Differentiation Analysis
Compared to the main methods in related work, Sieve offers several core differences and innovations:
-
Comprehensive Attention (Structural AND Temporal): Unlike
HC[3] andSifter[4] which primarily focus onstructural differences(e.g., number of spans, call graphs),Sieveexplicitly incorporates bothstructuralandtemporal informationinto itspath vector encoding. This allows it to identifyuncommon tracesthat might have a normal structure but abnormal latencies, or vice-versa. -
Online and Adaptive Sampling:
Sieveis designed as anonline samplerthat processes trace data in real-time. It adapts to changes in trace distribution by continuously updating itsRobust Random Cut Forest (RRCF)model, removing older traces and incorporating newer ones. This is a significant advantage over offline or batch-processing methods. -
Adaptive RRCF for Variable-Length Path Vectors:
Sieveinnovates on theRRCFmodel by enabling it to handlepath vectorsof varying lengths (due to new execution paths appearing). It introduces mechanisms likepath vector expansionanddimension reductionto manage this complexity, which is crucial for handling the dynamic nature of microservice traces. -
Attention Score-based Biased Sampling:
Sieveuses anattention scorederived fromRRCFto directly informsampling probabilities. This score quantifies how "uncommon" a trace is, allowing for a more granular and intelligent biased sampling decision compared to simpler heuristics or fixed rules. -
Robustness to Uncommonness Degree and Parameters: The evaluation demonstrates
Sieve's robustness, meaning its performance is stable even with varying degrees ofuncommonnessand across a range of parameter settings, which is important for practical deployment.In essence,
Sieveprovides a more holistic, adaptive, and robust approach tobiased samplingfordistributed trace databy combining structural and temporal insights with anonline anomaly detection algorithm.
4. Methodology
4.1. Principles
The core principle of Sieve is to intelligently reduce the volume of end-to-end trace data by prioritizing the sampling of uncommon traces over common or redundant ones. It operates on the intuition that most microservice systems run steadily, producing a large volume of similar traces. Uncommon executions, however, are rare but highly informative for debugging, diagnosis, and anomaly detection. Sieve aims to identify these rare patterns by focusing on temporal and structural differences between traces. It achieves this by using an isolation-based approach (specifically, an adapted Robust Random Cut Forest) to detect outliers in a stream of trace representations, assigning higher attention scores to these uncommon traces, and then using these scores to bias the sampling probability. The overall goal is to preserve useful information while drastically reducing storage overhead.
4.2. Core Methodology In-depth (Layer by Layer)
Sieve's workflow consists of several key components working in concert, as illustrated in Figure 2.
The following figure (Figure 2 from the original paper) shows the overview of Sieve's sampling workflow:
该图像是图7的示意图,展示了路径向量插入新维度的过程,包括路径向量扩展、边界框扩展和新根节点的创建,突出显示了向量在不同步骤的变化和组成。
The Trace Data (raw traces) are first processed by the Path Vector Encoder to convert them into a numerical format called path vectors. These path vectors are then fed into the Adaptive Scorer, which uses an enhanced Robust Random Cut Forest (RRCF) model to calculate an attention score for each vector. Finally, the Biased Sampler uses this attention score (along with previous scores) to decide whether to sample (keep) or discard the trace. The sampled traces are stored persistently.
4.2.1. Path Vector Encoder
The Path Vector Encoder is responsible for transforming a raw trace (which is a DAG of spans) into a numerical path vector . This encoding is crucial because algorithms cannot directly process human-readable text traces. The encoder incorporates both structural and temporal information into the vector representation.
-
Trace Traversal and Path Set Generation:
- A
tracerepresents the execution path of a request. It starts with aroot span. - From the
root span,Sieverecursively traverses allspansto identify unique execution paths. A path is defined as a sequence ofspansfrom theroot spanto any subsequentspan(e.g., ). - For each path identified, it is associated with the
latency() of thetail span(the lastspan) of that path. - This process generates a
path setand a correspondinglatency setfor a given trace.
- A
-
Path Vector Construction:
-
The
path vectoris built such that each index is associated with a unique path thatSievehas encountered across all traces seen so far. -
If a trace contains a specific path , the value at its corresponding index in the
path vectoris assigned thelatencyof thetail spanof that path. -
If a trace does not contain the path associated with index , then is assigned a special value of
-1. This-1signifies aninvalid index, indicating the absence of that particular path in the current trace. -
Handling Multiple Identical Paths: If a trace contains multiple instances of the same path,
Sieveselects themaximum latencyamong these paths for encoding into thepath vector. This ensures that potentially problematic high latencies are captured.The following figure (Figure 3 from the original paper) shows an example of path vector encoding of two traces:
该图像是图表,展示了图8中不同RRCT大小下处理轨迹数量变化时无效维度数和弱维度数的趋势,横轴为处理的轨迹数量,左图纵轴表示无效维度数,右图纵轴表示弱维度数。
-
Figure 3 illustrates this:
-
Trace 1 (left): Contains paths , , , . The
path vectorwould encode the latencies of the tail spans for these paths. -
Trace 2 (right): Contains paths , , . If path corresponds to index 2 in the global path set, and Trace 2 does not have this path, then for Trace 2 would be
-1. -
Distinguishability: This encoding scheme allows
Sieveto distinguish traces based on:- Structural Differences: Traces with different structures will have different sets of
valid indices(non--1values), making them distinguishable. - Temporal Differences: Traces with the same structure but different latencies for common paths will have different values in their
valid indices, making them distinguishable.
- Structural Differences: Traces with different structures will have different sets of
-
Extensibility:
Sievecan extend thispath vectorto include other meaningful features by concatenating them. Examples include:Number of spans: Provides additional structural information.Request status code: Important for discovering anomalous traces (e.g., HTTP 404/500 errors). This creates a more informativetrace encodingfor the subsequentisolationprocess.
4.2.2. Adaptive Scorer
The Adaptive Scorer takes the path vectors as input and determines an attention score for each, reflecting its uncommonness. It achieves this using an enhanced Robust Random Cut Forest (RRCF).
4.2.2.1. Isolation
- Core Idea: Since traces are encoded as
path vectors, findinguncommon tracesbecomes equivalent to findinguncommon path vectors.Uncommonnessimplies deviation from the majority distribution. - Partitioning Procedure:
Sieveuses anisolation-based approach(likeIsolation Forest) to repeatedly partition the set ofpath vectorsuntil each vector is isolated.Uncommon path vectorsare expected to be isolated earlier and more easily because they are fewer in number and differ significantly from thecommonmajority. - Tree Structure: A tree is used to represent this partitioning process. The
length of the pathfrom the root to a leaf node (where a singlepath vectorresides) indicates the number of partitions required to isolate that vector.Uncommon path vectorswill have shorter paths (shallower depths) thancommon path vectors. - RRCF for Streaming Data:
SieveadoptsRobust Random Cut Forest (RRCF)because it is well-suited foroutlier detectioninstreaming dataand can adapt to changing data distributions.Sievebuilds anenhanced RRCF modelto calculateattention scores.
4.2.2.2. RRCF Adaptation for Variable Dimensions
The original RRCF model typically processes data with fixed dimensions. Sieve introduces innovations to handle path vectors whose dimensions can vary (e.g., when a trace contains paths never seen before).
-
RRCF Structure: An
RRCFis composed of manyRobust Random Cut Trees (RRCTs). EachRRCTis built independently and has a fixed size (number ofpath vectorsit holds). -
Two Stages of an RRCT:
- Construction Stage:
-
When initially building an
RRCT, it takes a set ofpath vectorsof dimension : . -
Each
RRCTis built by recursively partitioning . Starting from the root,Sieveselects acutting dimension() and acutting value. -
Cutting Dimension Selection (Sieve's Innovation):
Sieveprioritizesstructural distinction.- First, it checks for
invalid dimensions: a dimension isinvalidif, for somepath vectorin the current subset, the index is-1(meaning that path is absent).Sieveprefers to select one of theseinvalid dimensionsas thecutting dimensionto quickly separate traces with different structures. - If no
invalid dimensionsexist,Sieveselects thecutting dimensionbased on itsweight. Theweightof dimension is calculated as: $ w _ { j } = \frac { \underset { 1 \leq i \leq m } { \mathrm{max} } \ x _ { i j } - \underset { 1 \leq i \leq m } { \mathrm{min} } \ x _ { i j } } { \sum _ { j = 1 } \left( \underset { 1 \leq i \leq m } { \mathrm{max} } \ x _ { i j } - \underset { 1 \leq i \leq m } { \mathrm{min} } \ x _ { i j } \right) } $- Symbol Explanation:
- : The weight of dimension .
- : The maximum value of in dimension across all
path vectorsin the current subset. - : The minimum value of in dimension across all
path vectorsin the current subset. - : Summation over all dimensions.
- Purpose: This formula calculates the relative range (difference between max and min values) of a dimension compared to the sum of ranges across all dimensions. A dimension with a larger range indicates greater variability, making it more effective for partitioning. The
probabilityof selecting dimension is proportional to , thus prioritizing dimensions with the largest differences (often indicative oftemporal uncommonness).
- Symbol Explanation:
- First, it checks for
-
Cutting Value Selection:
- If the selected
cutting dimensionis aninvalid indexfor somepath vectors, thecutting valueis fixed at-0.5. This ensures thatpath vectorswith-1(invalid) are separated from those with actual latency values. - Otherwise (if it's a valid dimension), is randomly selected between the maximum and minimum values in dimension from the current subset.
- If the selected
-
Partitioning: Based on , the subset is split into
XL(where ) andXR(where ), forming the left and right children. This process continues recursively until each leaf contains only onepath vector.The following figure (Figure 4 from the original paper) shows the workflow of the construction stage (left) and the maintenance stage (right):
该图像是四个折线图和散点图组合的图表,展示了四个微服务系统(VWR、AIOps、Boutique、TC)中Trace的得分与稀有性的关系,上部为得分,红圈突出显示高得分Trace,下部为二值稀有性指标,反映Trace的异常性分布。
-
The following figure (Figure 5 from the original paper) shows the illustration of the building procedure of one RRCT:
该图像是对比四个微服务系统(VWR、AIOps、Boutique、TC)追踪数据中得分与采样概率关系的图表。图中上半部分为分数趋势,下半部分展示对应采样概率,红圈标记了采样点,反映了不同系统中采样方法对重要追踪的选择效果。- Maintenance Stage: After construction,
RRCTsare dynamically updated forstreaming data.-
Leaf Removal: When a new
path vectorarrives, theleaf containing the oldest path vectorin theRRCTis removed to maintain a constant size . TheRRCTis adjusted by replacing the parent of the removed leaf with its sibling and updating thebounding box(which records the upper and lower bounds of values on each dimension for a node's children) of the internal node above the sibling.The following figure (Figure 6 from the original paper) shows the illustration of the removal of the oldest leaf:
该图像是一张柱状图,展示了不同延迟区间下Sieve采样方法的真实正例(TP)和假正例(FP)数量分布,反映了系统对不同延迟段的检测效果。 -
Path Vector Insertion: The incoming
path vectoris then inserted into eachRRCT.-
Handling New Dimensions (Sieve's Innovation): If the incoming
path vectorintroducesnew dimensions(paths never seen before):-
All existing
path vectorscurrently in theRRCTareextendedby appending-1for these new dimensions. -
The
bounding boxesof internal nodes are alsoextendedsimilarly. -
A
new rootis generated. Itscutting dimensionis set to one of the new dimensions, and itscutting valueto-0.5. The old root becomes its left child, and the incomingpath vector(representing the new dimension) becomes its right child. The new root'sbounding boxis updated.The following figure (Figure 7 from the original paper) shows the illustration of the insertion of path vector with a new dimension:
该图像是图表,展示了论文中Sieve算法对参数敏感性的分析,包含三个子图分别展示RRCT的大小、数量和阈值对真阳性(TP)与假阳性(FP)的影响。
-
-
Handling Existing Dimensions: If the incoming
path vectorhas nonew dimensions,Sieveuses the originalRRCFinsertion method [5].
-
-
Adaptability: By maintaining a constant size of the latest
path vectorsand dynamically updating,Sieveevaluates new traces in the context of recent data distribution, enabling it to adapt to evolving system behavior.
-
- Construction Stage:
4.2.2.3. Attention Score
After RRCTs are built and maintained, an attention score is assigned to each path vector.
-
Intuition:
Uncommon path vectorsare isolated at shallower depths in the tree (closer to the root) because fewer splits are needed to separate them. -
Individual RRCT Score: For a
path vectorending up in leaf ofRRCT, itsattention scorefrom that specific tree is calculated as: $ s _ { i } = \frac { \underset { \mathit { \theta } } { \mathrm{max} } \mathrm{depth}(l, t_i) } { \mathrm{depth}(l', t_j) } $- Symbol Explanation:
- : The attention score for
path vectorfrom a singleRRCT. - : The maximum possible depth in
RRCT. This acts as a normalization factor. - : The actual depth of the leaf containing
path vectorinRRCT.
- : The attention score for
- Purpose: A smaller (shallower leaf) results in a higher score, indicating greater
uncommonness.
- Symbol Explanation:
-
Final Attention Score: Since an
RRCFconsists ofRRCTs, the finalattention scorefor apath vectoris the average of the scores from allRRCTs: $ s = \frac { s _ { 1 } + \ldots + s _ { k } } { k } $- Symbol Explanation:
- : The final
attention scorefor thepath vector. - : Individual
attention scoresobtained from each of theRRCTs. - : The total number of
RRCTsin theRRCF.
- : The final
- Purpose: Averaging scores from multiple trees reduces bias and provides a more robust measure of
uncommonness. A high final score from most trees strongly suggests anuncommon trace.
- Symbol Explanation:
4.2.2.4. Dimension Reduction
The path vector expansion mechanism, while necessary for adaptability, can lead to a curse of dimensionality as the number of unique paths grows indefinitely. Sieve implements dimension reduction to mitigate this:
- Problem:
Sieverecords allexecution pathsit has ever seen, even if theRRCTsonly contain the latestpath vectors. This leads to many dimensions beinginvalid(value-1) for allpath vectorscurrently in theRRCT. - Safe Removal of Invalid Dimensions:
Sievesafely removes dimensions that areinvalidfor allpath vectorswithin theRRCTbecause they do not influence the partitioning decision. - Aggressive Removal of Weak Dimensions:
Sievetakes a more aggressive approach by temporarily removingweak dimensionsthat meet two criteria:- The dimension is not a
cutting dimension. - The
varianceof values in that dimension is lower than0.1.
-
Rationale:
Weak dimensionshave very little variability and contribute almost nothing to the partitioning process, hence they are unlikely to be chosen ascutting dimensions. -
Temporary Removal and Re-introduction:
Sievemonitors thevarianceof theseweak dimensions. If aweak dimension'svariancelater increases above0.1, it is re-classified as anormal dimension, and allpath vectorsin theRRCTareextendedto include this dimension again.The following figure (Figure 8 from the original paper) shows the number of invalid dimensions and weak dimensions:
该图像是图表,展示了图13中四个不同微服务追踪数据集(VWR、AIOps、Boutique、TC)中三种采样方法(Sieve、HC、Random)随着预算变化,采样到的罕见追踪比例。图表显示Sieve在各种预算下均优于其他方法,能有效提升罕见追踪的采样比例。
- The dimension is not a
Figure 8 illustrates the growth of invalid and weak dimensions over time and how their numbers change with different RRCT sizes, highlighting the necessity of this reduction.
4.2.3. Biased Sampler
The Biased Sampler makes the final decision to sample or drop a trace based on its attention score.
-
Sliding Window:
Sievemaintains asliding windowcontaining the most recentattention scores, along with thecurrent score. -
Variance Difference: It calculates two variances:
- : The
varianceof the past scores in the window. - : The
varianceof all scores (past plus the current one).
- : The
-
Deviation Detection: The difference indicates how much the
current scoredeviates from the distribution of the previous scores. A significant increase suggests the current trace isuncommon. -
Sampling Probability Calculation: The
sampling probabilityfor the current trace is determined as follows: $ f ( s _ { k + 1 } ) = \left{ \begin{array} { l l } { \displaystyle \frac { 1 } { 1 + e ^ { 2 \overline { { W } } - s _ { k + 1 } } } } & { \mathrm { i f } \ v a r _ { k + 1 } - v a r _ { k } > h , } \ { \displaystyle \frac { s _ { k + 1 } } { \sum _ { i = 1 } ^ { k + 1 } s _ { i } } } & { \mathrm { i f } \ v a r _ { k + 1 } - v a r _ { k } \leq h } \end{array} \right. $-
Symbol Explanation:
- : The
sampling probabilityfor the current trace withattention score. - : The mean value of the scores in the
sliding window. - : Variance of scores in the
sliding windowincluding the current score. - : Variance of scores in the
sliding windowexcluding the current score (i.e., only the previous scores). - : A predefined
threshold. - : Euler's number (base of the natural logarithm).
- : Sum of all scores in the
sliding window.
- : The
-
Interpretation of the formula:
- If : This condition means the
current scoresignificantly deviates from the distribution of previous scores, indicating a stronguncommonness. In this case,Sieveuses asigmoid function(rearranged from the paper's formula as ). This function rapidly pushes thesampling probabilityclose to1for highattention scores, ensuringuncommon tracesare almost certainly sampled. - If : If the
current scoredoes not cause a significant deviation, thesampling probabilityis calculated proportionally to itsattention scorerelative to the sum of all scores in thesliding window. This meanscommon traceswith lowattention scoreswill have very low sampling probabilities.
- If : This condition means the
-
-
Sampling Decision: After calculating ,
Sievegenerates arandom numberbetween[0, 1]. If is greater than or equal to thisrandom number, the trace is sampled; otherwise, it is dropped.
4.2.4. Online Sampling
Sieve is designed for real-time online sampling.
Algorithm 1: Online Sampling Algorithm
Input:
- The trace
- The
RRCFmodelrrcf - The
sliding windowswof the previous scores - The
threshold
Output:
- The
sampling decisiondecision(True if sampled, False if dropped) - The updated
sliding windowsw - The updated
RRCFmodelrrcf
Steps:
- Encode Trace: : The incoming trace is encoded into a
path vector. - Initialize Scores List: : An empty list to hold individual
RRCTscores. - Iterate through RRCTs: For each
rrctinrrcf: a.rrct.remove_oldest(): The oldestpath vector(leaf) is removed from therrctto maintain its fixed size. b.rrct.insert(x): The newpath vectoris inserted into therrct. This handlesnew dimensionsas described in Section 4.2.2.2. c. : Theattention scorefor is calculated from this specificrrct. d.scores.append(s): The individualrrctscore is added to thescoreslist. - Calculate Average Score: : The final
attention score(avg) for thepath vectoris computed by averaging the scores from allRRCTs. - Update Sliding Window:
sw.append(avg): The calculatedavgscore is added to thesliding window. - Calculate Sampling Probability:
a. : This checks if the
current scoresignificantly deviates from previous scores. * : Thesampling probabilityis calculated using thesigmoid function, making it close to1. b.else: * : Thesampling probabilityis calculated proportionally to thecurrent average scorewithin thesliding window. - Remove Oldest Score: : The oldest score is removed from the
sliding windowto maintain its fixed length. - Make Sampling Decision:
a. : If a randomly generated number is less than the calculated
sampling probability. * : The trace is sampled. b.else: * : The trace is dropped. - Return:
return decision, sw, rrcf: The sampling decision, updated sliding window, and updated RRCF model are returned.
4.2.5. Implementation Details
The authors implemented Sieve in Python based on an open-source RRCF implementation. Key modifications include:
- Cutting Dimension Selection: Prioritizing
invalid dimensionsandnew dimensionsto quickly identifystructural uncommonness. - Scoring Scheme: Replacing the original
RRCFscoring with theirattention scorecalculation (Section 4.2.2.3). - Path Vector Expansion: Enabling
RRCFto handle variable-lengthpath vectors(Section 4.2.2.2). - Dimension Reduction: Implementing mechanisms to remove
invalidandweak dimensionsto combat thecurse of dimensionality(Section 4.2.2.4).
5. Experimental Setup
5.1. Datasets
Sieve was evaluated on four different microservice trace datasets:
-
VWR (Virtual War Room):
- Source: Traces generated by
Virtual War Room (VWR)[16], a simulation framework. - Characteristics: Simulated
microservice systemwith 6microservices. Two types of faults were injected:network delayandearly stop. Traces collected usingOpenTracing. - Scale: 34,167 total traces; 32,592 normal, 66 with high
network delay, 1,509 withearly stop. - Purpose: To test
Sieve's ability to detect bothtemporal(network delay) andstructural(early stop)uncommonnessin a controlled environment.
- Source: Traces generated by
-
AIOps (Real-world microservice system):
- Source: Generated by a real
microservice applicationdeployed in a private cloud environment of anISP(Internet Service Provider) and used for aninternational AIOps challenge. - Characteristics: 13
microservices, some with multiple instances. Variousfaultsinjected into different instances to create diverseanomalies. - Scale: 164,340 normal traces and 4,092
abnormal traces. - Purpose: To evaluate
Sieve's performance on a larger, more complex real-world dataset with variedanomaly types.
- Source: Generated by a real
-
Boutique (Online Boutique microservice benchmark):
- Source: An open-source
microservice demo application(Online Boutique) consisting of a 10-tiermicroservice. Instrumented withOpenCensusand deployed on a Kubernetes cluster with 8 nodes. - Characteristics: Workloads run to emit traces. Two types of faults injected:
delay in email microservice responseandbreakdown in product catalog microservice. - Scale: 2,000 traces collected; 99
anomalous traces. - Purpose: To test
Sieveon a standard benchmark application with known fault injection scenarios.
- Source: An open-source
-
TC (Real production traces):
- Source: 6,561 traces from a large
telecommunication enterprise. - Characteristics: Traces grouped into 10 different
API types. The longest trace contained 451spans. - Purpose: To assess
Sieve's performance on highly complexproduction traces, particularly itsrepresentative samplingcapabilities across differentAPI typesand its ability to handle long traces.
- Source: 6,561 traces from a large
Latency Settings for Temporal Attention Experiments (Table I):
The following are the results from Table I of the original paper:
| Dataset | Spans | Latency | |
| Common | Uncommon | ||
| VWR | 6 | 200~400ms | >100000ms |
| AIOps | 58 | 100~200ms | 500 ∼ 1000ms |
| Boutique | 28 | < 200ms | > 500ms |
| TC | 15 | 22 ∼ 30ms | 50 ~ 66ms |
This table provides concrete examples of the latency characteristics of common and uncommon traces used in the temporal attention experiments for each dataset. For example, in VWR, common traces have 6 spans and latency between 200-400ms, while uncommon ones also have 6 spans but latency over 100,000ms.
Trace Structure Settings for Structural Attention Experiments (Table II):
The following are the results from Table II of the original paper:
| Dataset | Latency | Spans | |
| Common | Uncommon | ||
| VWR | < 200ms | 6 | {4,5} |
| AIOps | 400 ~ 450ms | 58 | 59 |
| Boutique | < 200ms | 28 | 18 |
| TC | 25 ~ 35ms | 15 | 14 |
This table specifies the structural characteristics (number of spans) for common and uncommon traces used in the structural attention experiments, alongside their latency ranges. For instance, in VWR, common traces have 6 spans and , while uncommon ones have 4 or 5 spans and similar latency.
5.2. Evaluation Metrics
The paper evaluates Sieve using several metrics, implicitly derived from its goals:
-
Proportion of Uncommon Traces Sampled:
- Conceptual Definition: This metric measures the effectiveness of a sampling method in retaining
informative traces(i.e.,uncommon traces) within the final sampled set, especially when compared to the total number ofuncommon tracesin the original dataset. A higher proportion indicates better sampling quality. - Mathematical Formula: Not explicitly provided in the paper, but can be inferred as: $ \text{Proportion} = \frac{\text{Number of Uncommon Traces Sampled}}{\text{Total Number of Uncommon Traces in Dataset}} $
- Symbol Explanation:
Number of Uncommon Traces Sampled: The count of traces identified asuncommonby the ground truth and successfully selected by the sampling algorithm.Total Number of Uncommon Traces in Dataset: The total count of traces identified asuncommonin the entire input dataset (ground truth).
- Conceptual Definition: This metric measures the effectiveness of a sampling method in retaining
-
Sampling Rate / Storage Cost Reduction:
- Conceptual Definition: This metric quantifies the reduction in data volume achieved by the sampling method. A lower
sampling rate(percentage of traces kept) implies higherstorage cost reduction. - Mathematical Formula: $ \text{Sampling Rate} = \frac{\text{Number of Sampled Traces}}{\text{Total Number of Traces in Dataset}} \times 100% $ $ \text{Storage Cost Reduction} = 1 - \text{Sampling Rate} $
- Symbol Explanation:
Number of Sampled Traces: The total count of traces selected by the sampling algorithm.Total Number of Traces in Dataset: The total count of all traces in the input dataset.
- Conceptual Definition: This metric quantifies the reduction in data volume achieved by the sampling method. A lower
-
True Positive (TP):
- Conceptual Definition: In the context of
Sieve's sensitivity evaluation, aTrue Positiveoccurs when anuncommon trace(as identified by ground truth or predefined criteria) is correctly given anattention scorethat causes itssampling probabilityto exceed thethreshold, leading to it being considered for sampling with high likelihood. - Mathematical Formula: $ \text{TP} = \text{Count of (Ground Truth Uncommon Trace AND Sampler Flags as Uncommon)} $
- Symbol Explanation:
Ground Truth Uncommon Trace: A trace that is genuinelyuncommonbased on the experimental setup (e.g., specificlatencyrange,structuraldeviation).Sampler Flags as Uncommon: A trace whoseattention scoremakes thevariance difference() exceed thethreshold.
- Conceptual Definition: In the context of
-
False Positive (FP):
- Conceptual Definition: A
False Positiveoccurs when acommon trace(as identified by ground truth) is incorrectly given anattention scorethat causes itssampling probabilityto exceed thethreshold, making it appearuncommonto the sampler. - Mathematical Formula: $ \text{FP} = \text{Count of (Ground Truth Common Trace AND Sampler Flags as Uncommon)} $
- Symbol Explanation:
Ground Truth Common Trace: A trace that is genuinelycommonbased on the experimental setup.Sampler Flags as Uncommon: A trace whoseattention scoremakes thevariance difference() exceed thethreshold.
- Conceptual Definition: A
-
Sampling Latency:
- Conceptual Definition: This metric measures the time taken by
Sieveto process a single incoming trace and make a sampling decision. It's crucial for evaluating theonline sampler's efficiency and suitability forreal-timedeployment. - Mathematical Formula: Not explicitly provided, but refers to the time duration in milliseconds (ms) from receiving a trace to outputting a sampling decision.
- Symbol Explanation: Measured in milliseconds (ms).
- Conceptual Definition: This metric measures the time taken by
5.3. Baselines
The paper compares Sieve against two main baseline models:
-
Hierarchical Clustering (HC) Method [3]:
- Description: This method biases sampling to maximize the diversity of traces based on
structural characteristics, specifically thenumber of spans. It groups traces into clusters and aims to sample representatives from each cluster. - Why Representative: It is a state-of-the-art
biased samplingmethod that attempts to capture a wider variety of trace types thanuniform sampling, particularly focusing onstructural differences. - Limitations (addressed by Sieve): It is generally an
offlinemethod (not designed foronline sampling) and primarily focuses onstructural diversity, failing to account fortemporal uncommonness.
- Description: This method biases sampling to maximize the diversity of traces based on
-
Random Sampling:
- Description: A
uniform samplingmethod where each trace has an equal probability of being selected. This is the simplest and most common baseline. - Why Representative: It represents the default or naive approach used by many existing
distributed tracing systems(likeDapper's low-raterandom sampling). - Limitations (addressed by Sieve): It is highly likely to miss
uncommonbutinformative tracesbecause these represent a tiny fraction of the total data.
- Description: A
6. Results & Analysis
6.1. Core Results Analysis
The experimental evaluation demonstrates Sieve's effectiveness in biased sampling, its robustness, and its efficiency compared to baselines.
6.1.1. Biased Sampling
Sieve's primary goal is to bias sampling towards uncommon traces by utilizing both temporal and structural attention.
6.1.1.1. Temporal Attention
Experiments replayed 1000 traces (990 common, 10 uncommon) with identical structures but differing latencies. The following figure (Figure 9 from the original paper) shows the relationship between the score and rarity of traces in four microservice systems (VWR, AIOps, Boutique, TC):
该图像是一个图表,展示了图14中不同微服务系统数据集(VWR、AIOps、Boutique、TC)的采样延迟累积分布函数(CDF),反映了各系统延迟在不同毫秒数上的分布差异。
- VWR (Figure 9a):
Uncommon traceshad significantly higherattention scores(easier to isolate). Theirsampling probabilityaveraged 0.990, whilecommon tracesaveraged 0.018. Thesigmoid functionrapidly pushes probabilities towards 1 for high-scoring uncommon traces. - AIOps (Figure 9b): Similar trend, with
uncommon traceshaving an average sampling probability of 0.999 versus 0.019 forcommon ones. Afalse positive(common trace with sampling probability close to 1) was observed, attributed to a sub-optimalthreshold, but its impact on overall sampling was deemed minimal. - Boutique (Figure 9c) and TC (Figure 9d): Both datasets showed
Sieveeffectively detecting alluncommon tracesand raising theirsampling probabilities enormously, even for subtle temporal differences (e.g., TC's small latency gap between common and uncommon).
6.1.1.2. Structural Attention
Experiments replayed 1000 traces (990 structurally common, 10 structurally uncommon) with latencies restricted to avoid temporal deviation. The following figure (Figure 10 from the original paper) shows the relationship between scores and sampling probabilities in trace data from four microservice systems (VWR, AIOps, Boutique, TC):
该图像是Sieve方法流程示意图,展示了从Trace数据经过编码器提取时序和结构信息,生成路径向量,经RRCT构建并计算注意力分数,最终通过采样器决定存储或丢弃。
- VWR (Figure 10a):
Common tracesaveraged 0.020sampling probability, whileuncommonaveraged 0.996. Twofalse positiveswere found, which upon inspection, had extremely low latencies (<10ms), making them temporally uncommon and thus scored high bySieve. - AIOps (Figure 10b): No
false positiveswere observed. Averagesampling probabilitieswere 0.018 forcommonand 0.997 foruncommon. - Boutique (Figure 10c): All
uncommon traceswere detected and sampled with high probability. Averages were 0.018 forcommonand 0.997 foruncommon. - TC (Figure 10d): Even for subtle
structural differences(e.g., 15 spans vs. 14 spans),Sievesuccessfully detected alluncommon traces. Conclusion:Sieveis highly effective at distinguishing and biasing sampling towards bothtemporalandstructural uncommonness.
6.1.2. Sensitivity
Sieve's sensitivity was evaluated against the degree of uncommonness and its internal parameters.
6.1.2.1. Sensitivity to Degree of Uncommonness
Experiments used 1000 traces from AIOps, varying the latency range of 10 uncommon traces while keeping 990 common traces below 200ms latency.
The following figure (Figure 11 from the original paper) shows Sieve's sensitivity to the degree of uncommonness:
该图像是图示,展示了论文中图3的路径向量编码示例。图中两个调用链跟踪图分别对应路径及其最大尾跨度延迟,路径向量中无对应路径的值设为-1,标注了每条路径是否存在于轨迹中。
Sievedetected all 10uncommon traces(highTrue Positives) in most scenarios where theuncommon latencyrange was clearly distinct.- Performance was slightly worse when
uncommon latencywas close tocommon latency(e.g., 200-250ms range for uncommon, vs. <200ms for common), as the boundary was fuzzy. - As the distinction grew, detection rapidly improved, and the
False Positive rateremained low. This suggestsSieveis robust to varying degrees ofuncommonness, especially when the difference is clear.
6.1.2.2. Sensitivity to Parameters
Experiments used 1000 traces from AIOps with a mixed composition of common and uncommon traces (Table III).
The following are the results from Table III of the original paper:
| Type | Spans | Latency | Amount |
| common | 58 | 0 ∼ 300ms | 990 |
| uncommon | 58 | > 400ms | 5 |
| uncommon | 59 | 0 ~300ms | 2 |
| uncommon | 7 | 0 ~300ms | 3 |
The following figure (Figure 12 from the original paper) shows Sieve's sensitivity to its parameters:
该图像是图4的示意图,展示了构建阶段(左侧)和维护阶段(右侧)的工作流程。左侧以集合(路径向量集合)为输入,通过查找无效维度并依权重划分为子集XL和XR,直到每个子集仅含一个路径向量。右侧输入为路径向量PV,移除最旧叶子,判断是否有新维度,进而拓展叶子、内部节点或插入,最终完成操作。
- RRCT Size (Figure 12a) and Number of RRCTs (Figure 12b): Varying
RRCT size(32 to 512) andnumber of RRCTs(10 to 90) hadlittle effect on Sieve's accuracy. Performance remained stable, indicating robustness to these configuration changes. - Threshold (Figure 12c): The
thresholdhad a direct influence.- Below :
False Positives (FP)decreased significantly, whileTrue Positives (TP)remained at maximum. - Between and :
TPremained stable (with a slight decrease), andFPreduced to 0. - Above :
TPdecreased to 7 and stabilized. Conclusion:Sieveis not overly sensitive to thethresholdwithin a certain range. Alow thresholdis recommended to achieve highTP coveragewith a small increase inFP.
- Below :
6.1.3. Performance Comparison
Sieve was compared against random sampling and hierarchical clustering (HC) based on the proportion of uncommon traces sampled for a given sampling budget. 1000 traces with specific compositions for each dataset were used (Table IV).
The following are the results from Table IV of the original paper:
| Type | VWR | AIOps | Boutique | TC | Amount | |||||
| Spans | Latency | Spans | Latency | Spans | Latency | Spans | Latency | |||
| common | 6 | 0 ∼ 200ms | 58 | 0 ∼ 300ms | 28 | 0 ∼ 200ms | 29 | 0 ∼ 40ms | 990 | |
| uncommon | 6 | > 100000ms | 58 | > 400ms | 28 | > 500ms | 29 | 60 ∼ 90ms | 5 | |
| uncommon | 5 | 0 ∼ 200ms | 59 | 0 ∼ 300ms | 18 | 0 ∼ 200ms | 30 | 0 ~ 40ms | 5 | |
The following figure (Figure 13 from the original paper) shows the proportion of uncommon traces sampled by three different methods in different trace datasets:
该图像是论文中的示意图,展示了一个RRCT的构建过程。图中用不同颜色的块表示不同跨度,块内数字代表延迟,路径向量和对应的切割维度及边界框关系被清晰说明。
- Superior Performance:
Sieveconsistently achieved thebest performanceacross all four datasets in terms ofproportion of uncommon traces sampled. - Random Sampling: As expected,
random sampling'sproportionwas roughly equal to thesampling rate(very low for uncommon traces). - Hierarchical Clustering (HC):
HC'sproportiongrew slowly and generally did not exceed 60%. This is becauseHCprimarily detectsstructural uncommonness, missing the 5temporally uncommon tracesin the mixed composition. - Sieve's Behavior: Initially,
Sieve'sproportionmight be lower thanHCat very small budgets due to itsonline nature(common traces occurring early might consume some budget). However, as thebudgetincreased,Sieve'sproportionconverged rapidlyand closely to 100%, indicating it successfully captures almost alluncommon traces. - TC Dataset:
Sievestill showed ideal performance on the TC dataset, even whereHC's effectiveness was significantly reduced because neithertemporalnorstructural uncommonnesswas explicitly considered byHCfor this specific configuration. Conclusion:Sieveis significantly more effective at capturinguncommon tracesthanrandom samplingandHC, especially when bothtemporalandstructural differencesare important.
6.1.4. Representative Sampling
Sieve's ability to obtain a representative sample (capturing traces of different patterns) was tested on the TC dataset, which has 10 API types.
The following are the results from Table V of the original paper:
| API-1 | API-2 | API-3 | API-4 | API-5 | API-6 | API-7 | API-8 | API-9 | API-10 | |
| Population | 11 | 16 | 48 | 119 | 135 | 210 | 325 | 571 | 607 | 4519 |
| Sieve | 11 | 16 | 41.2 | 64 | 51 | 31.2 | 9.4 | 15.2 | 10.8 | 65.4 |
| HC | 5.9 | 16 | 5.44 | 17.22 | 19.44 | 32.64 | 42.24 | 45.92 | 42.84 | 87.56 |
| Random | 0.42 | 0.68 | 2.3 | 5.38 | 6.34 | 10.78 | 16.12 | 27.28 | 29.7 | 217 |
- Comparison on TC (Table V):
Sievepreserved most traces fromAPI-1(11/11) andAPI-2(16/16), which arevery fewin the population. For otherAPI types, it sampled a reasonable number. HCsampled more uniformly across groups, butSieve's strength lies in preservinguncommon patterns, which likely include minorityAPI types.Random samplingcaptures very few from minority groups. Conclusion:Sieveis more effective at preservinguncommon tracesthanHC, especially for minority groups, making it suitable forrepresentative samplingfocused on rare patterns.
6.1.5. Storage Saving
The sampling rates of Sieve were recorded for different datasets.
The following are the results from Table VI of the original paper:
| Population | Sample | SamplingRate | |
| VWR | 34167 | 85 | 2.5% |
| AIOps | 168432 | 7602 | 4.5% |
| Boutique | 2000 | 114 | 5.9% |
- The
sampling ratesobserved forSievewere low: 2.5% forVWR, 4.5% forAIOps, and 5.9% forBoutique. - This translates to a
reduction of tracesup to 97.5% (for VWR). Conclusion:Sieveenablesconsiderable storage savingsby effectively discarding a large majority of common traces.
6.1.6. Overhead
The sampling latency of Sieve was measured. The number of RRCTs was set to 20 for this experiment, showing comparable performance to 50 RRCTs.
The following figure (Figure 14 from the original paper) shows the cumulative distribution of sampling latency:
该图像是论文中关于去除最旧叶子节点及其父节点的示意图,展示了更新边界框的过程,明确显示了节点属性和树结构的变化过程。
- Latency Breakdown: The
sampling latencycomprises time forpath vector encoding,attention score calculation, andsampling probability calculation. - Efficiency: The
latencyvaried between3msand33ms. - Even for
production traceswith hundreds ofspans(likeTC),Sieveproved to be efficient. Conclusion:Sieveexhibitslow overhead, making it practical forreal-time online samplingin demandingproduction environments.
6.2. Data Presentation (Tables)
All relevant tables (Table I, II, III, IV, V, VI) have been transcribed completely within the Results & Analysis section under their respective subsections.
6.3. Ablation Studies / Parameter Analysis
The paper conducted a sensitivity analysis (Section V-B) which serves as a form of parameter analysis, though not a strict ablation study (which typically removes components).
- RRCT Size (Figure 12a) and Number of RRCTs (Figure 12b): These parameters showed that
Sieve'saccuracy(TP and FP counts) was largelystableacross a wide range of values. This suggests thatSieveis robust to the exact choice ofRRCFconfiguration, making it easier to deploy without extensive tuning. For example, using 20RRCTsinstead of 50 yielded approximate performance, indicating flexibility. - Threshold (Figure 12c): This parameter directly controls the aggressiveness of
biased sampling. A lowerthresholdtends to increaseTrue Positives(capturing more uncommon traces) but might also lead to a slight increase inFalse Positives(sampling some common traces incorrectly). Conversely, a higherthresholdreducesFalse Positivesbut might miss someuncommon traces. The findings suggest a sweet spot whereTPis maximized andFPis minimized or acceptably low, encouraging operators to set alow thresholdfor highercoverage of uncommon traces.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper effectively addresses the critical challenge of managing vast volumes of trace data in distributed microservice systems by introducing Sieve, an innovative attention-based online sampler. Sieve's core contribution lies in its ability to explicitly leverage both temporal and structural differences to identify and prioritize uncommon traces for sampling. By encoding traces into path vectors, adapting Robust Random Cut Forest (RRCF) for variable dimensions and streaming data, and using attention scores to inform a biased sampling scheme, Sieve significantly enhances the selection of informative traces while drastically reducing storage costs. Experimental evaluations across diverse datasets confirm Sieve's high effectiveness in capturing uncommon traces, its robustness to varying conditions and parameter settings, and its low computational overhead, making it a practical solution for real-time monitoring in large-scale microservice environments.
7.2. Limitations & Future Work
The paper doesn't explicitly dedicate a section to limitations and future work, but some can be inferred:
-
False Positives in Temporal Attention: While generally robust, the paper noted
false positivesintemporal attentionexperiments (e.g., in VWR and AIOps datasets) due to sub-optimalthresholdsettings orcommon tracesexhibiting extreme (but not anomalous)latencyvalues. This suggestsSievemight occasionally misclassifycommon traceswith unusual but non-anomalous characteristics. -
Curse of Dimensionality: The paper explicitly mentions the
curse of dimensionalitycaused by the ever-growing number ofexecution pathsand the need fordimension reductiontechniques. WhileSieveimplements solutions like removinginvalidandweak dimensions, the long-term scalability and potential for information loss during aggressive dimension reduction could be a concern for extremely dynamic systems with constantly evolving paths. -
Feature Enrichment: The paper hints at the possibility of extending
path vectorswithmore meaningful featureslikenumber of spansorrequest status codes. This implies thatSieve's current feature set, while effective, could be further enhanced to capture more nuanceduncommonness. -
Threshold Tuning: Although
Sieveis robust tothresholdchanges within a range, finding the optimalthresholdstill involves some tuning, especially considering its impact on the trade-off betweenTrue PositivesandFalse Positives. Dynamic, adaptivethreshold adjustmentcould be a future improvement.Potential future research directions implied or directly stated:
-
Further exploration of
feature engineeringforpath vectorsto include more context-rich information from traces (e.g., specific tags, resource utilization, error messages) to improveuncommonness detection. -
Developing
adaptive threshold adjustment mechanismsto automatically optimize the balance betweensampling rateanduncommon trace coveragewithout manual parameter tuning. -
Investigating more advanced
dimension reduction techniquesthat are loss-less or minimize information loss while maintaining efficiency, especially for very high-dimensional trace spaces. -
Exploring the integration of
Sievewith otheranomaly detectionorroot cause analysissystems, leveraging itsbiased samplingto provide higher-quality inputs for downstream analysis.
7.3. Personal Insights & Critique
Sieve presents a compelling and practically relevant solution for a pervasive problem in microservice observability. The explicit combination of temporal and structural attention is a key strength, moving beyond the limitations of previous biased sampling methods. The adaptation of Robust Random Cut Forest for streaming, variable-dimensional trace data is a clever technical innovation that makes the approach feasible for dynamic microservice environments.
The emphasis on online sampling with low overhead is crucial. In real-world microservice systems, traces arrive continuously and in high volume, so any effective sampling solution must be able to process them in real-time without becoming a bottleneck. Sieve demonstrates this capability, which is a significant advantage for practical deployment.
One area for critical reflection could be the choice of -1 as an indicator for an invalid index (missing path). While effective for distinguishing presence/absence, it introduces a distinct value that RRCF would naturally separate. This is leveraged for structural distinction, but it also creates a fixed boundary. Exploring alternative ways to handle missing values that might be more robust to varying semantic interpretations of "missing" could be interesting. For instance, sometimes a missing path might be normal for a specific request type, while other times it's anomalous. The current mechanism relies on the RRCF to learn this distinction from data distribution.
The sigmoid function for sampling probability when variance exceeds the threshold is a good way to aggressively sample perceived anomalies. However, the linear proportionality for scores below the threshold might still result in sampling more common traces than strictly necessary if there's a continuous distribution of "mildly uncommon" traces that don't cross the threshold. This is a common trade-off in biased sampling – balancing the desire for comprehensive anomaly capture with aggressive data reduction.
Overall, Sieve is a well-designed and evaluated system that offers a robust and practical approach to intelligent trace sampling, directly addressing a bottleneck in distributed system monitoring. Its methodologies could potentially be transferred to other domains dealing with streaming structured data where outlier detection for resource optimization is critical.
Similar papers
Recommended via semantic vector search.