AiPaper
Paper status: completed

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis

Published:02/06/2025
Original Link
Price: 0.10
1 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

Mint introduces a commonality-variability approach for cost-efficient tracing of all requests, significantly reducing storage and network overhead while preserving rich trace information.

Abstract

Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis Haiyu Huang huanghy95@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Cheng Chen wu.cc@alibaba-inc.com Alibaba Group Hangzhou, China Kunyi Chen kunyichen666@gmail.com Alibaba Group Hangzhou, China Pengfei Chen* chenpf7@mail.sysu.edu.cn Sun Yat-sen University Guangzhou, China Guangba Yu yugb5@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Zilong He hezlong@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Yilun Wang wangyilun37@163.com Sun Yat-sen University Guangzhou, China Huxing Zhang huxing.zhx@alibaba-inc.com Alibaba Group Hangzhou, China Qi Zhou jackson.zhouq@alibaba-inc.com Alibaba Group Hangzhou, China Abstract Distributed traces contain valuable information but are of- ten massive in volume, posing a core challenge in tracing framework design: balancing the tradeoff between preserv- ing essential trace information and reducing trace volume. To address this tradeoff, previous approaches typically used a ‘1 or 0’ sampling strategy: retaining sampled traces while completely discarding unsampled ones. However, based on an e

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

The central topic of the paper is "Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis." This title highlights a novel approach to distributed tracing that aims to balance the need for comprehensive data collection with the practical constraints of cost and volume, by leveraging the inherent patterns and variations in trace data.

1.2. Authors

The authors and their affiliations are:

  • Haiyu Huang: Sun Yat-sen University, Guangzhou, China

  • Cheng Chen: Alibaba Group, Hangzhou, China

  • Kunyi Chen: Alibaba Group, Hangzhou, China

  • Pengfei Chen* (Corresponding Author): Sun Yat-sen University, Guangzhou, China

  • Guangba Yu: Sun Yat-sen University, Guangzhou, China

  • Zilong He: Sun Yat-sen University, Guangzhou, China

  • Yilun Wang: Sun Yat-sen University, Guangzhou, China

  • Huxing Zhang: Alibaba Group, Hangzhou, China

  • Qi Zhou: Alibaba Group, Hangzhou, China

    The authors are affiliated with both academic institutions (Sun Yat-sen University) and a major industry player (Alibaba Group), indicating a blend of theoretical research and practical application experience in distributed systems and software engineering.

1.3. Journal/Conference

The paper was published in the Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS '25).

ASPLOS is a highly reputable and influential conference in the fields of computer architecture, programming languages, and operating systems. It is known for publishing cutting-edge research that spans hardware and software, often with significant impact on system design and performance. Publication at ASPLOS signifies high quality and relevance in these areas.

1.4. Publication Year

2025

1.5. Abstract

Distributed traces contain valuable information but are often massive in volume, posing a core challenge in tracing framework design: balancing the tradeoff between preserving essential trace information and reducing trace volume. To address this tradeoff, previous approaches typically used a ‘1 or 0’ sampling strategy: retaining sampled traces while completely discarding unsampled ones. However, based on an empirical study on real-world production traces, we discover that the ‘1 or 0’ strategy actually fails to effectively balance this tradeoff. To achieve a more balanced outcome, we shift the strategy from the ‘1 or 0’ paradigm to the ‘commonality + variability’ paradigm. The core of ‘commonality + variability’ paradigm is to first parse traces into common patterns and variable parameters, then aggregate the patterns and filter the parameters. We propose a cost-efficient tracing framework, Mint, which implements the ‘commonality + variability’ paradigm on the agent side to enable all requests capturing. Our experiments show that Mint can capture all traces and retain more trace information while optimizing trace storage (reduced to an average of 2.7%) and network overhead (reduced to an average of 4.2%). Moreover, experiments also demonstrate that Mint is lightweight enough for production use.

/files/papers/6901a5ae84ecf5fffe471752/paper.pdf This is a direct link to the PDF of the paper, indicating its publication status.

2. Executive Summary

2.1. Background & Motivation

Core Problem and Importance

The core problem addressed by this paper is the massive volume of distributed trace data generated by modern, complex microservice systems. While distributed tracing is crucial for providing visibility into system behavior, diagnosing failures, and profiling performance, the sheer volume of data makes its collection, storage, and processing extremely expensive, especially in production environments. For instance, the paper notes that a large-scale e-commerce system at Alibaba generates approximately 18.6-20.5 pebibytes (PBs) of traces per day, leading to substantial storage and network overheads. This creates a critical challenge in designing tracing frameworks: how to effectively balance the tradeoff between preserving essential trace information and reducing trace volume to an acceptable cost.

Challenges and Gaps in Prior Research

Previous approaches to trace reduction primarily relied on a '1 or 0' sampling strategy. This strategy involves retaining a portion of traces (sampled traces) with their full information, while completely discarding the rest (unsampled traces). The paper identifies two significant limitations of this prevailing strategy based on an empirical study on real-world production traces:

  1. Loss of Essential Information from Discarded Traces: The '1 or 0' strategy inevitably leads to the complete loss of unsampled traces. However, the characteristics of traces needed for analysis are often unpredictable and cannot be determined at the time of sampling. The empirical study found a significant query miss rate of approximately 27.17% for SREs (Site Reliability Engineers) seeking information from discarded traces, highlighting that valuable information is being lost. This impedes diagnosis and troubleshooting.
  2. Lack of Effective Compression for Individual Trace Volumes: Existing sampling methods only reduce the number of traces, but they preserve the full original data for sampled traces. Each trace can still be very voluminous (e.g., several MBs), containing detailed information. General-purpose compression tools or log compression techniques are ineffective for trace data due to its unique topological structure, failing to leverage trace-specific characteristics for efficient compression.

Paper's Entry Point / Innovative Idea

The paper's innovative idea is to shift the trace overhead reduction strategy from the '1 or 0' paradigm to a 'commonality + variability' paradigm. This paradigm recognizes that trace data, despite its volume, contains widely existing common patterns (commonality) and specific differentiating details (variability). By parsing traces into these two components, the system can:

  • Cost-efficiently store the basic information (patterns) for all traces.

  • Selectively filter and store the detailed information (parameters) for valuable traces.

    This approach aims to provide a more balanced solution to the trace volume-information preservation tradeoff, allowing for the capture of all requests while significantly reducing overhead.

2.2. Main Contributions / Findings

Primary Contributions

The paper makes the following primary contributions:

  1. Empirical Study on Real-world Traces: Conducts an in-depth empirical study on production traces from Alibaba, revealing critical observations that highlight the limitations of existing sampling methods and the widespread existence of commonality and variability in trace data.
  2. Introduction of 'Commonality + Variability' Paradigm: Proposes a novel paradigm for trace reduction that moves beyond the '1 or 0' sampling. This paradigm focuses on separating trace data into common patterns and variable parameters, enabling more nuanced and cost-effective information retention.
  3. Proposal of Mint Framework: Designs and implements Mint, a cost-efficient distributed tracing framework. Mint applies the 'commonality + variability' paradigm on the agent side, allowing for the capture of all requests. It achieves this by aggregating common patterns and selectively filtering variable parameters.
  4. Extensive Evaluation and Practical Deployment: Conducts extensive experiments to validate Mint's effectiveness and performance, demonstrating its ability to reduce trace volume while capturing all requests. Mint has also been successfully deployed in a production environment at Alibaba, proving its practicality and lightweight nature.

Key Conclusions and Findings

The key conclusions and findings reached by the paper are:

  • The traditional '1 or 0' sampling strategy in distributed tracing leads to significant loss of potentially critical information (an average query miss rate of 27.17%) and fails to effectively compress individual trace volumes.

  • Commonality and variability are widely present in trace data at both inter-trace (34-56% commonality) and inter-span (25-45% commonality) levels, offering a strong basis for efficient data reduction.

  • Mint, by implementing the 'commonality + variability' paradigm, can capture information for all requests. This means that for any queried trace, at least approximate information (commonality) can be retrieved, addressing the problem of entirely discarded traces.

  • Mint significantly optimizes trace storage, reducing it to an average of 2.7% of the original volume, and network overhead, reducing it to an average of 4.2%.

  • Mint retains more valuable trace information, improving the accuracy of downstream root cause analysis (average A@1 increased from 25% to 50%).

  • Mint is lightweight enough for production use, introducing an average CPU usage increase of only 0.86% and an average end-to-end request latency increase of 0.21%.

  • Approximate traces, provided by Mint for unsampled requests, are highly beneficial for real-world use cases like trace exploration and batch trace analysis.

    These findings collectively demonstrate that Mint effectively resolves the long-standing tradeoff in distributed tracing by enabling comprehensive data capture at a significantly reduced cost, while preserving essential information for analysis.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To understand the Mint framework, a reader needs to be familiar with several fundamental concepts in distributed systems, observability, and data structures.

Distributed Tracing

Distributed tracing is an observability technique used to monitor and troubleshoot complex transactions as they flow through multiple services in a distributed system, such as a microservices architecture. It provides an end-to-end view of a request's journey, making it possible to identify performance bottlenecks, diagnose errors, and understand system behavior.

  • Trace: A trace represents the complete end-to-end execution path of a single request or transaction as it propagates through various services in a distributed system. It is composed of multiple spans.
  • Span: A span represents a single logical unit of work within a trace. It typically encapsulates an operation within a service, such as an RPC call, a database query, or a function execution. Each span has a name, a start time, an end time, and attributes (key-value pairs) that provide contextual information.
  • Trace ID: A trace ID is a unique identifier assigned to an entire trace. All spans belonging to the same trace share this trace ID, allowing them to be correlated and reconstructed into a complete transaction flow.
  • Parent ID / Span ID: Each span has a unique span ID. To establish the hierarchical relationship within a trace (i.e., which operation invoked which subsequent operation), a child span typically includes the span ID of its direct parent span as its parent ID. This forms the tree-like structure of a trace.

Microservices Architecture

A microservices architecture is an architectural style that structures an application as a collection of loosely coupled, independently deployable services. While offering benefits like scalability and flexibility, it also introduces significant complexity in terms of inter-service communication and debugging, making distributed tracing indispensable.

Sampling Strategies in Tracing

Sampling is a common technique in distributed tracing to reduce the volume of data by only retaining a subset of traces. This helps manage costs and overhead. The paper discusses a prevalent "1 or 0 sampling strategy," which means a trace is either fully retained (1) or completely discarded (0).

  • Head Sampling: The sampling decision is made at the very beginning of a trace's lifecycle (at the "head" of the request). If a trace is selected for sampling, all its subsequent spans are collected. If it's not selected, no spans are collected for that trace. This is cost-effective as it prevents data generation for unsampled traces, but it can miss valuable traces that become anomalous later.
  • Tail Sampling: The sampling decision is made after a trace has completed, typically at a centralized collector or backend. The entire trace is initially collected, and then a sampling policy (e.g., based on errors, latency, or specific attributes) decides whether to retain or discard it. This allows for more intelligent sampling based on the trace's full characteristics but incurs high network and processing overhead before the decision is made.
  • Retroactive Sampling: A more advanced form of sampling where a small amount of "breadcrumb" data (minimal information about a trace) is collected for all traces. If an anomaly is detected or a trace becomes interesting later, these breadcrumbs can be used to "retroactively" retrieve or reconstruct more detailed information for that trace, possibly from agents that temporarily buffered the full trace. This aims to combine the benefits of head and tail sampling.

Bloom Filter

A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.

  • How it works: It consists of a bit array and a set of hash functions. To add an element, it's hashed by multiple hash functions, and the bits at the resulting positions in the array are set to 1. To check if an element is present, it's hashed again, and if all corresponding bits are 1, the element might be in the set. If any bit is 0, the element is definitely not in the set.
  • Properties:
    • Space-efficient: Requires significantly less memory than storing the actual elements.
    • Probabilistic: Can produce false positives (it might say an element is in the set when it's not), but never false negatives (it will never say an element is not in the set when it actually is).
    • No deletion: Elements cannot be reliably removed from a standard Bloom Filter without potentially introducing false negatives for other elements.
  • Use in Mint: Mint uses Bloom Filters to efficiently store trace metadata (like trace IDs) associated with sub-trace patterns. This allows quick checks during queries to see if a particular trace ID belongs to a given pattern without storing all trace IDs explicitly, thus saving storage. The "never miss" property is crucial here for trace coherence.

Commonality and Variability

These are fundamental concepts in data analysis that Mint leverages:

  • Commonality: Refers to the shared, repetitive, or patterned aspects across multiple data instances. In traces, this could be the typical sequence of service calls for a specific type of request (topology) or the fixed structure of certain log messages within a span.
  • Variability: Refers to the unique, dynamic, or differentiating parameters and values that change between data instances, even if they share a common pattern. In traces, this could be specific HTTP parameters, user IDs, error codes, SQL query parameters, or latency values. Mint's core idea is to separate and treat these two aspects differently for efficient storage and analysis.

Regular Expressions (Regex)

Regular expressions are sequences of characters that define a search pattern. They are widely used for pattern matching within strings, for example, to find specific substrings, validate input formats, or extract parts of a text. In Mint, regex is used to define patterns for string attributes within spans and to extract the variable parameters that deviate from these patterns.

Longest Common Subsequence (LCS)

The Longest Common Subsequence (LCS) problem involves finding the longest sequence that is a subsequence of two or more sequences. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.

  • Use in Mint: Mint uses LCS to calculate the similarity between string values of span attributes during the offline pattern extraction stage. By finding the LCS between two tokenized string values, it can quantify how similar their underlying structure or common content is.

3.2. Previous Works

The paper contextualizes Mint by discussing existing distributed tracing frameworks and trace reduction techniques, as well as log compression methods.

Distributed Tracing Frameworks

  • Classic Frameworks: Magpie [6], X-trace [13], Dapper [52], Pinpoint [41], and Pivot [36] are highlighted as foundational works that introduced end-to-end observability. Dapper from Google, in particular, is often cited as the pioneering work that popularized the distributed tracing concept.
  • Popular Open-Source Frameworks: Jaeger [24], OpenTelemetry [43], and Zipkin [1] are mentioned as widely adopted frameworks in industry. OpenTelemetry is noted for its unified API, SDK, and the OTLP standard for trace data format and transmission.

Trace Sampling Methods

These methods primarily focus on the "1 or 0 sampling strategy" for trace reduction:

  • Head Sampling: [25, 52] Decisions made at the start, economical but can miss later anomalies.

  • Tail Sampling: [17, 23, 28, 29] Decisions made at the end, intelligent but higher overhead.

  • Retroactive Sampling: [60] (e.g., Hindsight) Collects minimal "breadcrumbs" for all traces, allowing for later retrieval of detailed information if a trace becomes interesting. This aims to address some limitations of head/tail sampling but still involves a form of sampling for full detail.

    The paper argues that despite these advancements, all these methods, being based on the "1 or 0" paradigm, face the fundamental limitation of either completely discarding valuable unsampled traces or failing to compress the volume of individual traces effectively.

Log-specific Compressors

The paper acknowledges the existence of log-specific compressors as log data is a "cousin" of distributed traces:

  • LogArchive [8], Cowic [31], MLC [12]: Compress logs by extracting features.

  • LogZip [34], RoughLogs [37], CLP [50], LogGrep [54]: Use log parsing to separate headers/schemas and variables for independent processing and compression. CLP specifically parses logs into schemas and stores variables as dictionary/non-dictionary.

    However, the paper explicitly states that these log compression methods are ineffective for trace compression. This is because distributed traces have a distinct topological data structure (the tree-like structure of spans and their relationships) that log data lacks. Direct application of log compression methods would not fully utilize trace characteristics, leading to poor compression performance. This highlights the need for trace-specific compression as proposed by Mint.

3.3. Technological Evolution

The evolution of tracing technologies can be seen as a progression from basic end-to-end visibility to more sophisticated cost-management and information retention strategies:

  1. Early Tracing Frameworks (e.g., Dapper, X-Trace): Focused on establishing the fundamental concepts of distributed tracing (traces, spans, IDs) to provide visibility into complex systems. The initial goal was simply to make tracing possible.

  2. Popularization and Standardization (e.g., Jaeger, Zipkin, OpenTelemetry): Focused on making tracing more accessible, interoperable, and performant, with efforts towards unified APIs and data formats. As adoption grew, the data volume problem became more apparent.

  3. Trace Sampling (e.g., Head, Tail, Retroactive): Introduced to address the cost and volume challenges. These methods aimed to reduce the number of traces collected while trying to preserve "interesting" ones. However, they operated under the "all or nothing" principle for individual traces.

  4. Mint's Approach ('Commonality + Variability'): Represents a further evolution by recognizing the limitations of simple sampling. Mint moves beyond merely reducing the count of traces to intelligently compressing and differentiating within traces. It aims to capture all requests (at least approximately) and to optimize the size of individual traces by separating patterns from parameters. This allows for a more granular control over the tradeoff between cost and information completeness.

    Mint fits within this timeline as a next-generation approach that aims to overcome the shortcomings of previous sampling-only strategies by providing a more nuanced and cost-effective way to achieve near-full observability in large-scale systems.

3.4. Differentiation Analysis

Compared to the main methods in related work, Mint introduces several core differences and innovations:

  1. Paradigm Shift from '1 or 0' Sampling:

    • Traditional Sampling (e.g., OT-Head, OT-Tail, Sieve): These methods make a binary decision for each trace: either collect its full, raw data or discard it entirely. This leads to information loss for unsampled traces and no compression for individual sampled traces.
    • Mint's 'Commonality + Variability' Paradigm: Mint does not discard any trace entirely. Instead, it extracts common patterns for all traces and variable parameters separately. This means that at least approximate information (the commonality part) for every request is retained, addressing the critical problem of query miss rates. For sampled traces, Mint still collects the variable parameters but compresses the overall structure.
  2. Cost-Effective Retention of All Requests:

    • Traditional Sampling: Can only guarantee full information for a small, sampled subset of requests.
    • Mint: Provides basic (commonality) information for all requests at a very low cost (patterns are aggregated, and metadata stored efficiently via Bloom Filters). This enables users to query and retrieve information for every trace, even if only in an approximate form for unsampled ones.
  3. Lightweighting of Individual Trace Volumes:

    • Traditional Sampling: Even for sampled traces, the full raw data is preserved, meaning no reduction in the volume of each trace.
    • Mint: Applies trace-specific compression by parsing spans and sub-traces into patterns and parameters. This significantly reduces the size of individual traces, even those that are fully sampled, by storing patterns once and only the variable parts for each instance. This is a key differentiator from log compression techniques as well, which do not account for the topological structure of traces.
  4. Agent-Side Reduction:

    • Tail Sampling (e.g., OT-Tail, Sieve): Requires collecting full traces at the agent and sending them to a backend collector/sampler, incurring high network overhead before reduction.

    • Mint: Performs parsing and initial reduction (pattern extraction, parameter buffering) directly on the agent side. This immediately saves both network bandwidth (by sending only compressed patterns/metadata or sampled parameters) and storage space. Hindsight also does agent-side biased sampling but still relies on breadcrumbs and full trace retrieval for details, whereas Mint's approach fundamentally changes how trace data is represented.

      In essence, Mint innovates by changing how trace data is represented and handled, rather than just which traces are kept. It offers a more granular, dual-layered approach to trace reduction that directly tackles the limitations of previous methods by preserving broader observability while maintaining cost efficiency.

4. Methodology

4.1. Principles

The core idea behind Mint's methodology is the 'commonality + variability' paradigm. This paradigm is grounded in the empirical observation that distributed trace data, despite its massive volume, exhibits significant redundancy and regularity. Many traces share common execution paths, service invocation sequences, and structural elements within their spans (commonality), while only specific values or parameters differ (variability).

The theoretical basis and intuition are as follows:

  1. Commonality allows for aggregation: If many traces or spans follow the same pattern, this pattern needs to be stored only once. Subsequent instances can then simply reference this common pattern, dramatically reducing storage. This forms the "basic information" for all traces.

  2. Variability allows for selective retention: The unique, differentiating parts (parameters) are often the most valuable for debugging specific issues. By separating these parameters, Mint can apply intelligent filtering and only store the variable parameters for traces deemed "important" (e.g., sampled due to anomalies or rare paths), while still linking them back to their common patterns.

    This dual approach enables Mint to:

  • Retain a cost-efficient, approximate representation (common patterns) for all requests, ensuring no request is completely discarded.
  • Provide full, detailed information (common patterns + variable parameters) for a selectively sampled subset of "important" requests, but in a highly compressed format.
  • Reduce both network and storage overhead by performing this parsing and aggregation at the agent side.

4.2. Core Methodology In-depth (Layer by Layer)

Mint's workflow is a multi-stage process, primarily executed on the agent side, to implement the commonality + variability paradigm. Let's break down the tracing walkthrough as depicted in Figure 5.

4.2.1. Overview of Mint's Tracing Walkthrough

The following figure (Figure 5 from the original paper) shows an overview of Mint's tracing walkthrough:

Figure 5. An overview of Mint's tracing walkthrough. Figure 5. An overview of Mint's tracing walkthrough.

The process unfolds as follows:

  1. Trace Data Generating: When a request arrives at an application node, Mint's client API (compatible with existing standards like OpenTelemetry) generates span data. Unlike traditional frameworks that might immediately record or report these spans, Mint redirects them to its Span Parser.
  2. Inter-Span Level Parsing: The Span Parser analyzes individual spans to identify commonality and variability. It decomposes each incoming span into a pattern (the common part) and parameters (the variable part). The pattern is then encoded into a compact pattern ID. This pattern (or its ID) updates the Span Pattern Library, while the parameters are temporarily stored in a Params Buffer on the agent.
  3. Inter-Trace Level Parsing: On an application node, a single request can generate multiple spans which, linked by their parent IDs, form a sub-trace (a segment of the full trace on that specific node). The Trace Parser then analyzes the topology of this sub-trace to extract a sub-trace pattern. It searches for the most similar pattern in the Topology Pattern Library. Once a matched template (pattern) is found or a new one is created, essential metadata of the incoming sub-trace (e.g., trace ID) is associated with this matched template using a Bloom Filter. This Bloom Filter and the Topology Pattern Library are also stored on the agent.
  4. Basic Information Uploading: Periodically, the Mint agent uploads the Pattern Library (containing both span patterns and sub-trace patterns) and the Bloom Filters to the Mint backend. This ensures that the basic, common information (the commonality part) of all traces is preserved in the backend at a low cost.
  5. Key Traces Sampling: Mint employs two specialized samplers:
    • Symptom Sampler: Monitors the Params Buffer for spans with abnormal parameters (e.g., HTTP status code 502, unusually high latency values) and marks their corresponding traces as "sampled."
    • Edge-Case Sampler: Monitors the Pattern Library for sub-trace patterns that represent rare or infrequent execution paths, marking these traces as "sampled."
  6. Parameters Uploading: If a trace is marked as "sampled" by either sampler (or external sampling rules), all its variable parameters (the variability part) that are distributed across different Mint agents are then emitted to the Mint backend. This ensures that the full, detailed information for these important traces can be reconstructed.

4.2.2. Inter-Span Level Parsing

This stage focuses on breaking down individual spans into reusable patterns and extractable parameters.

3.2.1 Offline Stage: Warming up Span Parser

The Span Parser is warmed up offline by analyzing a random sample of raw spans. This pre-processing helps to build an initial set of patterns for efficient online parsing. The core idea is to train a dedicated parser for each attribute within a span and then combine these attribute patterns to form a complete span pattern.

The following figure (Figure 6 from the original paper) shows the offline construction process of the span parser:

Figure 6. The offline stage of span parser. Figure 6. The offline stage of span parser.

  1. Clustering and Pattern Extracting:

    • For attributes with string values: The LCS (Longest Common Subsequence) method is used to determine the similarity between string values. This helps group similar strings into clusters, from which a regular expression (regex) pattern can be extracted. The similarity δ\delta between two tokenized string values s1s_1 and s2s_2 is calculated as: $ \delta ( s _ { 1 } , s _ { 2 } ) = \frac { | L C S ( s _ { 1 } , s _ { 2 } ) | } { \operatorname* { m a x } ( | s _ { 1 } | , | s _ { 2 } | ) } $ Where:
      • s1s_1 and s2s_2 are the two tokenized strings (e.g., words are tokens).
      • LCS(s1,s2)|LCS(s_1, s_2)| represents the length (number of tokens) of the Longest Common Subsequence between s1s_1 and s2s_2.
      • max(s1,s2)\max(|s_1|, |s_2|) is the length (number of tokens) of the longer of the two strings.
      • A similarity threshold (e.g., 0.8) is used to group strings into clusters (C={C0,...,Cn}C = \{C_0, ..., C_n\}). For each cluster CiC_i, the shortest regular expression that can represent all strings in that cluster is extracted as the pattern PiP_i.
    • For attributes with numeric values: A bucketing approach based on exponential intervals is used. This groups numeric values into predefined ranges. For a numeric value dd, its bucket index ii is determined by: $ i = { \Big \lceil } \log _ { \gamma } ( d ) { \Big \rceil } $ Where:
      • dd is the numeric value.
      • α\alpha is a precision parameter (e.g., 0.5).
      • γ=1+α1α\gamma = \frac { 1 + \alpha } { 1 - \alpha }.
      • Values in bucket BiB_i fall within the interval (γi1,γi](\gamma^{i-1}, \gamma^i]. Specifically, values in bucket B0B_0 fall within (0, 1]. Each bucket BiB_i is represented by an interval pattern (loweri,upperi]({\mathrm{lower}}_i, {\mathrm{upper}}_i].
  2. Parsers Building:

    • For numeric attributes, the parser Pi\mathcal{P}_i is the fixed mapping formula i=logγ(x)i = \lceil \log_\gamma(x) \rceil.
    • For string attributes, a prefix tree (or Trie) is used to store all extracted regular expression patterns. This allows for efficient storage and matching, as common prefixes among patterns are shared.
  3. Patterns Combination: Mint combines these individual attribute patterns that frequently appear together to form a complete span pattern. For example, if attribute A1A_1 consistently shows pattern P11P_{11} when attribute A2A_2 shows pattern P23P_{23}, then SP=[P11,P23]SP = [P_{11}, P_{23}] becomes a span pattern. A unique pattern ID (e.g., a UUID) is assigned to each span pattern, and these are stored in the Span Pattern Library.

3.2.2 Online Stage: Matching and Parsing

Once Mint is deployed, it performs online parsing on newly generated raw spans.

The following figure (Figure 7 from the original paper) shows the online stage of span parser:

Figure 7. The online stage of span parser. Figure 7. The online stage of span parser.

  1. Hierarchical Attribute Parsing (HAP): This is the core of the online process.

    • For each attribute of an incoming span, Mint processes it in parallel:
      • It uses the corresponding attribute parser (built in the offline stage) to find the matching pattern (e.g., traversing the prefix tree for strings or applying the mapping formula for numerics).
      • The matched pattern is identified as the common part.
      • The variable part is then extracted: for string attributes, it's done using regular expressions; for numeric attributes, it's the difference from the interval's lower bound.
    • If a completely new span pattern is encountered (e.g., due to system changes), the relevant attribute parser is updated to include this new pattern. HAP is designed to be highly parallel to meet low-latency requirements.
  2. Span Pattern Mapping: After parsing all attributes, the resulting attribute patterns are combined. Mint then attempts to match this combined span pattern against those stored in the Span Pattern Library.

    • If an exact match is found, the existing pattern ID is returned.

    • If no match is found, the new span pattern is added to the Pattern Library, and a new pattern ID is assigned. This allows the Pattern Library to adapt to evolving system behavior.

      At the end of this stage, raw spans are effectively transformed into pattern IDs (representing the commonality) and variable parameters (representing the variability), with patterns being aggregated in the Pattern Library and parameters buffered for later processing.

4.2.3. Inter-Trace Level Parsing

This stage organizes the parsed spans into sub-traces and extracts their topology patterns.

  1. Sub-Trace Construction: Since a Mint agent operates on a single application node, it constructs sub-traces by linking spans that share the same trace ID and are generated on that node. The linking is done based on their parent IDs (as illustrated in Figure 4 in the paper, which shows a trace's tree-like structure). This local view prevents high-latency cross-node interactions.

  2. Pattern Extracting: For each incoming sub-trace, Mint encodes its topology information (the sequence and hierarchy of spans) into a vector. Each element in this vector represents a parent-child relationship, and crucially, each element is a span pattern ID (from the Inter-Span Level Parsing). This means the sub-trace pattern captures both the topology and the aggregated content (span patterns) of the spans within it.

    The following figure (Figure 8 from the original paper) illustrates how Mint uses a sub-trace pattern:

    Figure 8. Mint uses a sub-trace pattern to store the topology information of a sub-trace. It also uses a Bloom Filter to efficiently store the trace metadata for each sub-trace pattern. Figure 8. Mint uses a sub-trace pattern to store the topology information of a sub-trace. It also uses a Bloom Filter to efficiently store the trace metadata for each sub-trace pattern.

  3. Matching or Updating: The sub-trace pattern is then checked against the Topo Pattern Library (a library of topology patterns).

    • If an exact match is found, that existing pattern is used as the matched pattern.
    • If no match, the new sub-trace pattern is added to the Topo Pattern Library. This aggregation ensures that the topology information for similar sub-traces is stored only once.
  4. Metadata Mounting: To allow users to query traces efficiently using trace metadata (e.g., trace ID), Mint attaches a Bloom Filter to each sub-trace pattern.

    • The Bloom Filter stores the metadata (specifically, trace IDs) of all sub-traces that conform to that particular sub-trace pattern.
    • As a probabilistic data structure, the Bloom Filter is highly space-efficient for set membership testing. It can report false positives (indicating a trace ID is present when it's not) but never false negatives (it will never fail to report a present trace ID). This "no-miss" property is critical for ensuring trace coherence (all segments of a trace can be found). False positives can be mitigated by cross-agent verification during query time.

4.2.4. Data Reporting

After the two levels of parsing, the Mint agent has three types of data:

  1. Patterns: Span patterns and sub-trace patterns stored in the Pattern Library.

  2. Metadata: Trace metadata (via Bloom Filters) mounted on sub-trace patterns.

  3. Parameters: Variable parameters temporarily stored in the Param Buffer.

    Mint's reporting strategy differentiates between commonality and variability parts:

  4. Uploading Basic Information: The Mint agent periodically uploads the entire Pattern Library and Bloom Filters to the backend. This guarantees that the aggregated commonality information for all traces is preserved in the backend, supporting the "capture all requests" goal. The cost is low because millions of traces typically map to only hundreds of patterns.

  5. Uploading Parameters: For the variable parameters in the Param Buffer, Mint decides whether to send them to the backend based on whether the trace they belong to has been marked as sampled. If a trace is sampled, all its variable parameters (potentially across multiple agents) are sent to the backend, enabling full reconstruction of the sampled trace.

Sampling Rules

Mint is flexible and can work with existing sampling rules (e.g., head sampling or tail sampling). Additionally, it provides two specialized samplers tailored for the commonality + variability paradigm:

  1. Symptom Sampler:

    • Purpose: To identify and sample symptomatic traces – those exhibiting anomalous behavior.
    • Mechanism: It continuously monitors the variable parameters stored in the Param Buffer.
    • Sampling Criteria:
      • For numerical parameters: Samples outliers (e.g., values exceeding the 95th percentile, P95).
      • For string parameters: Samples values containing user-defined abnormal words.
    • This ensures that traces linked to potential issues are fully captured.
  2. Edge-Case Sampler:

    • Purpose: To identify and sample traces that follow rare execution paths.

    • Mechanism: It monitors the topology patterns within the Pattern Library.

    • Sampling Criteria: It tracks the frequency of traces matching each topology pattern and prioritizes sampling traces associated with less common patterns. For example, if pattern A is seen 99% of the time and pattern B only 1%, pattern B traces will have a higher sampling probability.

    • This helps ensure diversity in the collected full traces, capturing unusual but potentially important system behaviors.

      When a trace is marked as sampled by any rule, Mint coordinates across agents (via the backend) to ensure all parameters for that specific trace ID are collected, maintaining trace coherence.

5. Experimental Setup

5.1. Datasets

The experiments used a combination of open-source microservice benchmarks and real-world production systems from Alibaba.

  • OnlineBoutique (OB) [18]: A web-based e-commerce application. It consists of 10 microservices, implemented in various programming languages, communicating via gRPC.
  • TrainTicket (TT) [14]: A railway ticketing service. It involves 45 services that communicate through synchronous REST invocations and asynchronous messaging.
    • Deployment: Both OnlineBoutique and TrainTicket were deployed on a Kubernetes platform using 12 virtual machines (VMs). Each VM had an 8-core 2.10GHz CPU, 16GB of memory, and ran on Ubuntu 18.04.
  • Alibaba Production Microservice System: A real-world production environment at Alibaba. This system includes typical components like web services, MongoDB [39] access, and MySQL [40] access. This dataset allows for evaluating Mint's performance and practicality in a high-scale, real-world scenario.
  • Alibaba Cloud Subsystems: For specific evaluations like compression ratio and pattern extraction performance, Mint used raw trace data generated by:
    • 6 subsystems from Alibaba (with varying API counts and call depths) to evaluate compression ratio.
    • 5 sub-services in Alibaba Cloud (over an hour) to test pattern extraction capabilities.

Data Sample Example

The paper provides illustrations of trace and span structures. The following figure (Figure 4 from the original paper) shows an example of trace and span structure. This helps to visualize the hierarchical nature of trace data.

Figure 4. An example of trace and span structure. Figure 4. An example of trace and span structure.

As shown in Figure 4, a trace (e.g., trace ID ab8d...) is composed of multiple spans (e.g., span ID b1e6). Each span represents a unit of work (e.g., product service/get_product) and has:

  • Topology part: Span ID, Parent ID, Trace ID (defines position in the trace tree).

  • Metadata part: Service name, Operation name, Start time, End time.

  • Attributes part: Key-value pairs providing detailed context (e.g., attributes.sql, attributes.http.method, event: sql query).

    Another example of a data sample is an approximate trace output. The following figure (Figure 10 from the original paper) shows an example of querying an unsampled trace to get an approximate trace:

Figure 10. An example of querying an unsampled trace to get an approximate trace, variables are masked by \(^ \\bullet < ^ { \\ast } > ^ { \\bullet }\) and numbers are bucket-mapped. Figure 10. An example of querying an unsampled trace to get an approximate trace, variables are masked by ^ \\bullet < ^ { \\ast } > ^ { \\bullet } and numbers are bucket-mapped.

This approximate trace illustrates how Mint presents unsampled data: variables (e.g., specific city_id, rb_id) are masked or generalized, and numbers (e.g., latency values) are represented by their bucket ranges rather than exact figures.

Dataset Selection Rationale

The chosen datasets are effective for validating Mint's performance because:

  • Microservice Benchmarks (OB, TT): Provide controlled, reproducible environments for comparing Mint against baselines under varying loads, and for injecting faults to test RCA effectiveness. They are widely used in trace analysis studies, allowing for comparison with prior work.
  • Alibaba Production Systems: Offer real-world scale, complexity, and traffic patterns, ensuring that Mint's claims of cost-efficiency and lightweight operation are validated in a demanding industrial setting, not just idealized lab conditions. The diversity of subsystems also helps test the robustness of Mint's pattern extraction across different application types.

5.2. Evaluation Metrics

For each evaluation metric, the conceptual definition, mathematical formula (if applicable), and symbol explanation are provided.

  • Storage Overhead:

    • Conceptual Definition: The total disk space consumed by the trace data after processing by a tracing framework. It quantifies the cost associated with persisting trace information. A lower storage overhead is desirable.
    • Mathematical Formula: Not explicitly provided by the paper, but typically calculated as the total size in bytes (or GB/PB) of the stored data.
    • Symbol Explanation: N/A.
  • Network Overhead:

    • Conceptual Definition: The amount of data transmitted over the network for tracing purposes, specifically between application nodes and the tracing backend. It quantifies the bandwidth consumed and its potential impact on network latency for business traffic. A lower network overhead is desirable.
    • Mathematical Formula: Not explicitly provided by the paper, but typically measured as data volume per unit time (e.g., MB/min).
    • Symbol Explanation: N/A.
  • Miss Rate (for Trace Queries):

    • Conceptual Definition: The proportion of user queries for specific trace IDs that yield no results because the traces were discarded by the sampling strategy. It directly measures the effectiveness of information retention from a user's perspective. A lower miss rate is desirable.
    • Mathematical Formula: Not explicitly provided by the paper, but can be generally defined as: $ \text{Miss Rate} = \frac{\text{Number of queries with no results}}{\text{Total number of user queries}} $
    • Symbol Explanation: Number of queries with no results refers to the count of instances where a user tried to retrieve a trace but the tracing system had no record of it. Total number of user queries is the total number of attempts by users to retrieve trace information.
  • Query Response Ability (Exact Hit, Partial Hit, Miss):

    • Conceptual Definition: Categorizes the outcome of a user's trace query.
      • Exact Hit: The tracing framework returns the complete, detailed information of the queried trace.
      • Partial Hit: The framework returns approximate information (e.g., common patterns, but masked variables) for the queried trace.
      • Miss: No information for the queried trace is returned.
    • Mathematical Formula: Not a single formula, but counts for each category.
    • Symbol Explanation: N/A.
  • Top-1 Accuracy (A@1) for Root Cause Analysis (RCA):

    • Conceptual Definition: A metric used to evaluate the effectiveness of root cause analysis methods. It measures the percentage of times the actual fault (root cause) is correctly identified as the single most likely cause (top-1 prediction) by the RCA method, given the trace data. A higher A@1 indicates better diagnostic performance.
    • Mathematical Formula: $ A@1 = \frac{\text{Number of correct top-1 root cause predictions}}{\text{Total number of fault injections}} $
    • Symbol Explanation:
      • Number of correct top-1 root cause predictions\text{Number of correct top-1 root cause predictions}: The count of fault injection scenarios where the RCA method's highest-ranked (top-1) suggestion matches the actual injected fault.
      • Total number of fault injections\text{Total number of fault injections}: The total number of distinct faults intentionally introduced into the system during experiments.
  • Compression Ratio:

    • Conceptual Definition: A measure of how much the size of data is reduced after compression. It is the ratio of the original data size to the compressed data size. A higher compression ratio means more effective compression.
    • Mathematical Formula: $ \text{Compression Ratio} = \frac{\text{Original Data Size}}{\text{Compressed Data Size}} $
    • Symbol Explanation:
      • Original Data Size\text{Original Data Size}: The size of the trace data before any compression or processing by Mint (raw format).
      • Compressed Data Size\text{Compressed Data Size}: The size of the trace data after being processed and reduced by Mint (or other compression tools).
  • CPU Usage Increment:

    • Conceptual Definition: The additional percentage increase in CPU utilization of application nodes caused by the tracing framework. It quantifies the computational overhead imposed on the monitored applications. A lower increment implies a more lightweight tracing solution.
    • Mathematical Formula: Not explicitly provided, typically calculated as (CPU usage with tracing - CPU usage without tracing) / CPU usage without tracing * 100%.
    • Symbol Explanation: N/A.
  • End-to-End Request Latency:

    • Conceptual Definition: The total time taken for a user request to complete, from the moment it is initiated until the response is received. The tracing framework might add overhead that increases this latency. A smaller increase is better.
    • Mathematical Formula: Not explicitly provided. Measured in milliseconds or seconds.
    • Symbol Explanation: N/A.
  • Query Latency:

    • Conceptual Definition: The time taken for the tracing system's backend to retrieve and return requested trace information to a user. This measures the responsiveness of the querying interface. Lower latency is better.
    • Mathematical Formula: Not explicitly provided. Measured in milliseconds or seconds.
    • Symbol Explanation: N/A.

5.3. Baselines

Mint was compared against a selection of widely used tracing frameworks and novel methods from recent years:

  • OpenTelemetry under head-sampling (OT-Head) [48]: Represents a common, basic sampling strategy where a decision is made at the start of the trace. The sampling rate was set to 5% in the experiments. OpenTelemetry agents were instrumented, and data was collected by OpenTelemetry Collector, then stored in Grafana Tempo and Elasticsearch [11].

  • OpenTelemetry under tail-sampling (OT-Tail) [44]: Represents an intelligent sampling strategy where decisions are made at the end of the trace, allowing for policy-driven retention (e.g., keeping anomalous traces). To ensure effectiveness, traces with an 'is_abnormal' tag (injected for 5% of traffic) were targeted for sampling.

  • Hindsight [60]: A tracing framework implementing retroactive sampling. It collects minimal "breadcrumbs" for all traces and allows for later biased sampling. It was configured as specified in its original paper, compatible with OpenTelemetry.

  • Sieve [23]: An online tail sampling approach that uses robust random cut forest (RRCF) to sample uncommon traces. It was implemented using OpenTelemetry agents/collectors, with data redirected to Sieve for filtering.

  • OT-Full (OpenTelemetry with 100% sampling rate): Used as a reference point to represent the no trace reduction scenario, providing the baseline for original volume.

    For evaluating Mint's lossless compression ability (against log-specific compressors):

  • LogZip [33]: A log compressor that extracts hidden structures via iterative clustering.

  • LogReducer [55]: Identifies and reduces log hotspots.

  • CLP [50]: Efficient and scalable search on compressed text logs, which parses logs into schemas.

5.4. Additional Experimental Details

  • Fairness in Sampling: For comparisons involving sampling, 5% of the injected traffic in benchmarks was tagged with an 'is_abnormal' label. This ensured that all biased sampling methods (OT-Tail, Sieve, Hindsight) had a consistent target for "valuable" traces. OT-Head and Mint were also set to a comparable 10% sampling rate in some end-to-end overhead tests.
  • Chaos Engineering: To simulate real-world microservices problem analysis and evaluate RCA effectiveness, chaos engineering was performed on OnlineBoutique and TrainTicket using Chaosblade [7]. A total of 56 faults were injected, including:
    • CPU exhaustion
    • Memory exhaustion
    • Network delays
    • Code exceptions
    • Error returns
  • Downstream RCA Methods: The trace data captured by Mint and baselines was then fed into three classic trace-based RCA methods:
    • MicroRank [57]

    • TraceRCA [30]

    • TraceAnomaly [35]

      The results from these RCA methods were used to calculate the top-1 accuracy (A@1), assessing the analytical value of the retained trace data.

6. Results & Analysis

6.1. Core Results Analysis

The experiments extensively demonstrate Mint's effectiveness in trace data reduction, information retention, and practical performance.

Effectiveness in Reducing Trace Data

The following figure (Figure 11 from the original paper) shows tracing network and storage overhead on OnlineBoutique and TrainTicket Benchmarks.

Figure 11. Tracing network and storage overhead on OnlineBoutique and TrainTicket Benchmarks. Figure 11. Tracing network and storage overhead on OnlineBoutique and TrainTicket Benchmarks.

Figure 11 clearly shows Mint's superior performance in reducing both network and storage overhead compared to all baseline methods.

  • OT-Full: Represents the scenario with no trace reduction (100% sampling). It has the highest network and storage overhead, serving as the reference maximum.

  • OT-Head: Reduces network and storage overheads proportionally to its sampling rate (e.g., 5% sampling would reduce overhead to roughly 5% of OT-Full). This is because it discards unsampled traces at the source.

  • OT-Tail & Sieve: While these methods effectively reduce storage overhead (to around the anomaly rate) because they filter traces at the backend, they fail to reduce network overhead. This is because all trace data must first be transmitted to the backend before the sampling decision is made, making their network overhead similar to OT-Full.

  • Hindsight: Performs biased sampling at the agent side, achieving reduction in both network and storage. However, it still requires transmitting "breadcrumbs" for all traces, leading to slightly higher network overhead than pure head sampling for the same effective data.

  • Mint: Achieves the most significant reductions. By processing and compressing traces on the agent side using its commonality + variability paradigm, Mint drastically lowers both network and storage overhead. On average, Mint reduces total trace storage overhead to 2.7% and network overhead to 4.2% of the original volume.

    Analysis: Mint's advantage stems from its fundamental design:

  1. Agent-side Reduction: Like OT-Head and Hindsight, Mint reduces data volume before transmission, saving network bandwidth.
  2. Compression of Individual Traces: Unlike all other methods, Mint doesn't just reduce the number of traces; it significantly reduces the size of each trace (even sampled ones) by abstracting common patterns and only storing variable parameters. This is where it achieves further reductions beyond what simple sampling can offer.

Effectiveness in Retaining More Trace Information

The paper evaluates information retention through two aspects: specific query response ability and analytical value for downstream RCA.

Query Response Ability

The following figure (Figure 12 from the original paper) shows hit number for user queries in Alibaba during 14 days, demonstrating Mint can respond to all requests.

Figure 12. Hit number for user queries in Alibaba during 14 days, demonstrating Mint can respond to all requests. Figure 12. Hit number for user queries in Alibaba during 14 days, demonstrating Mint can respond to all requests.

Figure 12 demonstrates Mint's ability to respond to user queries for traces, especially those that would be completely missed by traditional sampling.

  • Total (Red Dashed Line): Represents the total number of user queries per day.
  • Baseline Methods (OT-Head, OT-Tail, Sieve, Hindsight): Show a significant number of "misses" (the gap between their hits and the 'Total' line). This confirms the empirical finding that traditional sampling leads to a substantial loss of queryable information.
  • Mint:
    • When considering partial hits, Mint responds to all queries. This is a critical outcome of the commonality + variability paradigm, as Mint retains at least approximate information (patterns) for every single trace.

    • Even when considering only exact hits (full trace information), Mint still outperforms baseline methods, responding to more queries than any other approach. This implies that its biased sampling and compression strategy effectively preserves more detailed information for critical traces.

      Analysis: This result directly addresses the 27.17% query miss rate problem identified in the empirical study. By guaranteeing at least a partial hit for every query, Mint significantly improves observability and reliability engineers' ability to investigate issues, even for unsampled traces.

Effectiveness for Downstream Analysis

The following table (Table 3 from the original paper) shows a comparison of the effects of different tracing frameworks in downstream root cause analysis's accuracy. The following are the results from Table 3 of the original paper:

Benchmark RCA Method Tracing Framework
OT-Head OT-Tail Sieve Hindsight Mint
OB MicroRank 0.1563 0.2188 0.2813 0.2188 0.6563
TraceAnomaly 0.2813 0.2500 0.3750 0.3438 0.7037
TraceRCA 0.2500 0.2500 0.3438 0.2188 0.6563
TT MicroRank 0.0714 0.1429 0.1786 0.1786 0.5357
TraceAnomaly 0.1786 0.1786 0.2857 0.3214 0.5714
TraceRCA 0.1429 0.1786 0.2500 0.1429 0.5000

Table 3 shows the Top-1 Accuracy (A@1) for different combinations of RCA methods and tracing frameworks. Mint consistently and significantly outperforms all baseline methods across both benchmarks (OB and TT) and all three RCA methods.

  • Baselines: OT-Head, OT-Tail, Sieve, and Hindsight all yield relatively low A@1 values, often below 38%. This is because these RCA methods (like MicroRank, TraceRCA for spectrum analysis, and TraceAnomaly for template comparison) heavily rely on having a sufficient number of common-case traces and normal templates for effective analysis. The "1 or 0" sampling strategy of baselines discards many common traces, thereby weakening the input data for these RCA tools.

  • Mint: Achieves dramatically higher A@1 values, with an average increase from 25% to 50% compared to baselines. For example, MicroRank on OB jumps from 0.1563 (OT-Head) to 0.6563 (Mint).

    Analysis: This is a strong validation of Mint's ability to retain analytically valuable information. By preserving essential information for all traces (via commonality) and detailed information for edge cases (via variability and targeted sampling), Mint provides a much richer and more complete dataset for RCA algorithms. This enables these algorithms to build more accurate models of normal behavior and better identify deviations, leading to improved diagnosis capabilities.

6.2. Contribution of Commonality and Variability Analysis

This section focuses on Mint's lossless compression capabilities, comparing it with log-specific compressors and evaluating the contribution of its inter-span and inter-trace parsing.

The following figure (Figure 13 from the original paper) shows basic information about six datasets from Alibaba and the distribution of APIs for different datasets.

Figure 13. Description of 6 datasets in Alibaba. Figure 13. Description of 6 datasets in Alibaba.

Figure 13 provides context for the datasets used to evaluate compression. These are diverse real-world subsystems from Alibaba, with varying numbers of traces, API counts, and average call depths. This diversity ensures that Mint's compression effectiveness is tested across different application complexities.

The following table (Table 4 from the original paper) shows a comparison in terms of Compression Ratio. The following are the results from Table 4 of the original paper:

Dataset LogZip LogReducer CLP w/o Sp w/o T p Mint
A 16.7989 19.9594 22.7130 21.2503 23.1391 45.1874
B 13.0634 10.2291 14.0553 14.3892 15.9906 41.0603
C 5.2411 7.8613 11.5995 14.3229 13.7895 22.7690
D 11.0920 11.4943 14.4578 10.2255 18.1101 36.6724
E 8.7774 9.0126 12.1723 10.1943 17.1917 32.0245
F 9.2336 10.6611 15.3990 8.9231 19.7713 29.7024

Table 4 compares the compression ratio of Mint against log-specific compressors and its own ablation variants.

  • Log-specific Compressors (LogZip, LogReducer, CLP): These methods achieve moderate compression ratios (ranging from ~5x to ~22x). As expected, they are less effective for traces due to their lack of topological awareness.
  • Mint: Consistently achieves significantly higher compression ratios across all datasets, ranging from 22.7690x to 45.1874x. On average, Mint outperforms the baselines by 14.90 to 28.38 times in compression ratio. This demonstrates that Mint's specific approach to leverage trace characteristics (topology and span structure) is far more effective.
  • Ablation Study (w/o Sp, w/o Tp):
    • w/o Sp (Mint without inter-span level parsing): This variant performs worse than the full Mint, indicating that breaking down individual spans into patterns and parameters is crucial for compression.

    • w/o Tp (Mint without inter-trace level parsing): This variant also performs worse than the full Mint, demonstrating the importance of aggregating sub-trace topologies for overall reduction.

      Analysis: The results in Table 4 strongly validate that both inter-span level parsing and inter-trace level parsing are critical components of Mint's compression strategy. By jointly analyzing commonality at these two levels, Mint can achieve superior lossless compression ratios compared to methods that do not fully understand trace structures.

6.3. Mint Overhead and Scalability

This section evaluates Mint's practical overhead and scalability using a real-world Alibaba production microservice system.

End-to-End Tracing Overhead

The following figure (Figure 14 from the original paper) shows tracing overhead during 14 load tests on Alibaba's production microservices system.

Figure 14. Tracing overhead during 14 load tests on Alibaba's production microservices system. Figure 14. Tracing overhead during 14 load tests on Alibaba's production microservices system.

Figure 14 presents Mint's operational overhead compared to no tracing and OT-Head (with 10% sampling).

  • Egress Network Bandwidth (Figure 14b):
    • No-Tracing: Baseline.
    • OT-Head: Increases bandwidth by 19.35%.
    • Mint: Increases bandwidth by only 2.88%. This indicates Mint's significant effectiveness in reducing network traffic due to its agent-side parsing and selective parameter uploading.
  • CPU Usage (Figure 14c):
    • No-Tracing: Baseline.
    • OT-Head: Increases CPU usage by 1.25%.
    • Mint: Increases CPU usage by 0.86%. Mint's computational overhead is acceptable and even lower than OT-Head, suggesting its parsing mechanisms are efficient.
  • Storage Overhead (Figure 14d):
    • No-Tracing: Baseline.
    • OT-Head: Increases storage by 1.7%.
    • Mint: Increases storage by 1.8%. Mint's storage overhead is comparable to OT-Head (which samples at 10%), demonstrating its efficiency while capturing all requests (albeit approximated) and retaining more information than OT-Head.

Latency

The following figure (Figure 15 from the original paper) shows the comparison of end-to-end request latency and query latency in Alibaba's production microservices system.

Figure 15. End-to-End request latency and query latency on Alibaba's production microservices system. Figure 15. End-to-End request latency and query latency on Alibaba's production microservices system.

Figure 15 analyzes the latency impact of Mint.

  • End-to-End Request Latency (Figure 15a):
    • Using Mint increased average request latency by a mere 0.21%. This is a very low overhead, making Mint suitable for latency-sensitive production environments.
  • Query Latency (Figure 15b):
    • Querying with Mint took an average of 4.2% longer than with OpenTelemetry.
    • The P95 query latency was below 1 second. This level of query latency is well within acceptable limits for most production environments.

Pattern Extraction Performance

The following table (Table 5 from the original paper) shows pattern extraction results of Span Parser and Trace Parser on 5 sub-services in Alibaba Cloud. The following are the results from Table 5 of the original paper:

Sub-Service Raw Trace Number Span Level Pattern Number Trace Level Pattern Number
S1 146,985 11 8
S2 126,245 10 8
S3 93,546 14 5
S4 92,527 7 3
S5 79,179 9 3

Table 5 illustrates the effectiveness of Mint's Span Parser and Trace Parser in aggregating patterns from large volumes of raw traces.

  • For hundreds of thousands of raw traces (e.g., 146,985 for S1), the Span Parser extracts a very small number of span-level patterns (e.g., 11 for S1).

  • Similarly, the Trace Parser extracts an even smaller number of trace-level patterns (e.g., 8 for S1).

  • The compression ratio from raw log count to span-level patterns ranges from 6,681 to 13,362, and to trace-level patterns from 15,780 to 30,842.

    Analysis: This demonstrates the profound commonality present in real-world trace data and Mint's ability to efficiently identify and aggregate it. The extremely low number of resulting patterns confirms why Mint can store the commonality of all traces at such a minimal cost. This pattern aggregation is a cornerstone of its cost-efficiency.

6.4. Ablation Studies / Parameter Analysis

Parameter Sensitivity

The following figure (Figure 16 from the original paper) shows the total storage size of patterns and parameters with the similarity threshold at 0.2, 0.4, 0.6, and 0.8.

Figure 16. The total storage size of patterns and parameters with the similarity threshold at 0.2, 0.4, 0.6, and 0.8. Figure 16. The total storage size of patterns and parameters with the similarity threshold at 0.2, 0.4, 0.6, and 0.8.

Figure 16 illustrates the impact of the similarity threshold (used in the Span Parser for clustering string values) on total storage size.

  • The graph shows that as the similarity threshold increases (from 0.2 to 0.8), the total storage size of patterns and parameters decreases.

  • A higher similarity threshold means that only very similar strings are grouped into the same pattern. This leads to more distinct patterns but fewer parameters being extracted (as more content becomes part of a pattern).

  • Conversely, a lower similarity threshold means more diverse strings are grouped into the same pattern, leading to fewer patterns but more content being treated as variable parameters.

  • The paper states that the default similarity threshold is set to 0.8 after considering both total storage size and the effectiveness of parameter extraction (a very high threshold could reduce the meaningful variability that can be extracted).

    Analysis: This sensitivity analysis highlights that parameter tuning, specifically for the similarity threshold, is important for optimizing Mint's performance. The chosen default (0.8) represents a balance, aiming to maximize storage reduction while ensuring that meaningful variability (parameters) can still be identified for detailed analysis.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper introduces Mint, a cost-efficient distributed tracing framework that fundamentally shifts the paradigm of trace reduction from traditional "1 or 0" sampling to a novel "commonality + variability" approach. By performing two-level parsing (inter-span and inter-trace) on the agent side, Mint effectively separates trace data into common patterns and variable parameters. It aggregates these patterns and uses Bloom Filters to efficiently store metadata, ensuring that basic information for all requests is captured and retrievable. For critical (sampled) traces, Mint filters and retains detailed variable parameters.

The key contributions of Mint are:

  • Comprehensive Coverage: It captures information for all requests, providing at least an approximate trace even for those not fully sampled, addressing the significant issue of discarded valuable traces in prior methods.

  • Exceptional Cost Efficiency: It drastically reduces trace storage overhead to an average of 2.7% and network overhead to an average of 4.2% compared to original volumes.

  • Enhanced Analytical Value: By retaining more analytically relevant information, Mint significantly improves the accuracy of downstream root cause analysis methods, with an average A@1 increase from 25% to 50%.

  • Production Readiness: Experiments on a large-scale Alibaba production system demonstrate Mint's lightweight nature, with minimal impact on CPU usage (0.86% increase) and request latency (0.21% increase), making it practical for real-world deployment.

    The successful deployment in Alibaba and positive user feedback underscore its practical utility and significant improvement in observability and user experience.

7.2. Limitations & Future Work

While Mint presents a significant advancement, the paper implicitly and explicitly touches upon certain limitations:

  • Parameter Sensitivity: The performance of Mint can be influenced by internal parameters, such as the similarity threshold used in the Span Parser. While a default of 0.8 is provided, optimal values might vary across different application contexts, requiring careful tuning. An excessively high threshold might weaken parameter extraction effectiveness by making too much content part of the pattern, potentially losing valuable variability.

  • Bloom Filter False Positives: While Bloom Filters ensure no false negatives (never missing a trace), they can produce false positives. Although the paper states this can be alleviated through upstream-downstream verification across multiple agents, this mitigation adds complexity to the query process and implies a potential for slightly longer query times or additional computation.

  • Approximate Nature of Unsampled Traces: While Mint captures all requests, unsampled traces are stored in an approximate form (patterns with masked variables or bucket-mapped numbers). While this is shown to be highly useful for trace exploration and batch analysis, it may not provide the granular, exact detail sometimes needed for very specific, deep-dive debugging scenarios where every single parameter value is crucial. The trade-off is a pragmatic one, but it is still a trade-off.

  • Cold Start Issues: The offline stage for warming up the Span Parser is necessary to achieve acceptable performance in the early stages of online parsing. This implies a potential cold start period for Mint in new or rapidly evolving systems before sufficient patterns are learned.

  • Complexity of System Changes: When the system changes, previous patterns may become outdated, requiring developers to trigger Mint's reconstruct interface to rebuild the patterns. This introduces a manual or semi-manual management overhead for adapting to significant system evolution.

    The paper doesn't explicitly outline future work, but based on these observations, potential future directions could include:

  • Developing adaptive or self-tuning mechanisms for parameters like the similarity threshold.

  • Exploring more robust and efficient ways to handle Bloom Filter false positives or alternative data structures with similar space efficiency but without this drawback.

  • Investigating methods to quantify and improve the "quality" or "fidelity" of approximate traces for specific debugging tasks.

  • Automating the pattern reconstruction process in response to system changes to reduce manual intervention.

  • Extending the commonality + variability paradigm to other observability signals (e.g., logs, metrics) for integrated cost efficiency.

7.3. Personal Insights & Critique

Mint represents a genuinely insightful and practical evolution in distributed tracing. The shift from a binary "1 or 0" sampling decision to the nuanced "commonality + variability" paradigm is its most significant innovation. It acknowledges the real-world dilemma of wanting full observability without incurring prohibitive costs.

Inspirations:

  • Pragmatic Trade-off Management: The core idea of providing "approximate truth for all" rather than "exact truth for some and nothing for others" is highly inspiring. This approach resonates deeply with the needs of large-scale production systems where SREs often need some information about every request, even if not all details are present.
  • Leveraging Data Structure: Explicitly leveraging the topological data structure of traces, rather than treating them as flat logs, is crucial. This is where Mint gains a significant advantage over log-centric compression techniques and provides a blueprint for other observability data types that might have similar inherent structures.
  • Agent-Side Intelligence: Performing complex parsing and reduction on the agent side is a smart design choice. It directly impacts network overhead, which is often a major cost component in distributed systems.

Potential Issues, Unverified Assumptions, or Areas for Improvement:

  • Cognitive Load of Approximate Traces: While beneficial, relying on approximate traces might introduce a new cognitive load for developers and SREs. They need to understand what information is approximated, what is filtered, and how to interpret masked values. The UI/UX for presenting these approximate traces (e.g., how Figure 10 is visualized) will be critical for user adoption and effectiveness.

  • Parameter Tuning Complexity: As noted in the parameter sensitivity analysis, selecting the right similarity threshold is important. For a system with thousands of microservices and diverse workloads, finding a universally optimal threshold or dynamically adapting it could be challenging. Poorly tuned parameters might either lose too much variability or fail to compress effectively.

  • Bloom Filter Collisions and Impact: While Bloom Filters provide excellent space efficiency, the occurrence of false positives (even if mitigated by verification) could potentially lead to slightly longer query times or unnecessary data retrieval attempts in the backend. The paper's claim of "never miss a trace" for Bloom Filters refers to false negatives, not false positives. While false positives don't mean a trace is missed, they can mean it's incorrectly identified as belonging to a pattern it doesn't, necessitating extra checks.

  • Reconstruction of Outdated Patterns: The need to reconstruct patterns when the system changes implies that Mint's effectiveness relies on a relatively stable system structure. In highly dynamic environments with frequent code deployments or A/B testing, the Pattern Library might require frequent updates, potentially incurring overhead or temporary sub-optimal compression.

  • Generalizability to Other Systems: While validated on Alibaba's scale, the specific types of commonality and variability might differ in other domains or organizational structures. The effectiveness of Mint's specific pattern extraction algorithms might need re-validation or adaptation for drastically different systems (e.g., scientific computing workflows vs. e-commerce).

    Overall, Mint offers a compelling vision for future distributed tracing, moving beyond simplistic sampling to intelligent, structure-aware data reduction that enhances observability without breaking the bank. It provides a strong foundation for future work in balancing the intricate tradeoffs of monitoring complex, large-scale systems.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.