Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis
TL;DR Summary
Mint introduces a commonality-variability approach for cost-efficient tracing of all requests, significantly reducing storage and network overhead while preserving rich trace information.
Abstract
Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis Haiyu Huang huanghy95@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Cheng Chen wu.cc@alibaba-inc.com Alibaba Group Hangzhou, China Kunyi Chen kunyichen666@gmail.com Alibaba Group Hangzhou, China Pengfei Chen* chenpf7@mail.sysu.edu.cn Sun Yat-sen University Guangzhou, China Guangba Yu yugb5@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Zilong He hezlong@mail2.sysu.edu.cn Sun Yat-sen University Guangzhou, China Yilun Wang wangyilun37@163.com Sun Yat-sen University Guangzhou, China Huxing Zhang huxing.zhx@alibaba-inc.com Alibaba Group Hangzhou, China Qi Zhou jackson.zhouq@alibaba-inc.com Alibaba Group Hangzhou, China Abstract Distributed traces contain valuable information but are of- ten massive in volume, posing a core challenge in tracing framework design: balancing the tradeoff between preserv- ing essential trace information and reducing trace volume. To address this tradeoff, previous approaches typically used a ‘1 or 0’ sampling strategy: retaining sampled traces while completely discarding unsampled ones. However, based on an e
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
The central topic of the paper is "Mint: Cost-Efficient Tracing with All Requests Collection via Commonality and Variability Analysis." This title highlights a novel approach to distributed tracing that aims to balance the need for comprehensive data collection with the practical constraints of cost and volume, by leveraging the inherent patterns and variations in trace data.
1.2. Authors
The authors and their affiliations are:
-
Haiyu Huang: Sun Yat-sen University, Guangzhou, China
-
Cheng Chen: Alibaba Group, Hangzhou, China
-
Kunyi Chen: Alibaba Group, Hangzhou, China
-
Pengfei Chen* (Corresponding Author): Sun Yat-sen University, Guangzhou, China
-
Guangba Yu: Sun Yat-sen University, Guangzhou, China
-
Zilong He: Sun Yat-sen University, Guangzhou, China
-
Yilun Wang: Sun Yat-sen University, Guangzhou, China
-
Huxing Zhang: Alibaba Group, Hangzhou, China
-
Qi Zhou: Alibaba Group, Hangzhou, China
The authors are affiliated with both academic institutions (Sun Yat-sen University) and a major industry player (Alibaba Group), indicating a blend of theoretical research and practical application experience in distributed systems and software engineering.
1.3. Journal/Conference
The paper was published in the Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS '25).
ASPLOS is a highly reputable and influential conference in the fields of computer architecture, programming languages, and operating systems. It is known for publishing cutting-edge research that spans hardware and software, often with significant impact on system design and performance. Publication at ASPLOS signifies high quality and relevance in these areas.
1.4. Publication Year
2025
1.5. Abstract
Distributed traces contain valuable information but are often massive in volume, posing a core challenge in tracing framework design: balancing the tradeoff between preserving essential trace information and reducing trace volume. To address this tradeoff, previous approaches typically used a ‘1 or 0’ sampling strategy: retaining sampled traces while completely discarding unsampled ones. However, based on an empirical study on real-world production traces, we discover that the ‘1 or 0’ strategy actually fails to effectively balance this tradeoff. To achieve a more balanced outcome, we shift the strategy from the ‘1 or 0’ paradigm to the ‘commonality + variability’ paradigm. The core of ‘commonality + variability’ paradigm is to first parse traces into common patterns and variable parameters, then aggregate the patterns and filter the parameters. We propose a cost-efficient tracing framework, Mint, which implements the ‘commonality + variability’ paradigm on the agent side to enable all requests capturing. Our experiments show that Mint can capture all traces and retain more trace information while optimizing trace storage (reduced to an average of 2.7%) and network overhead (reduced to an average of 4.2%). Moreover, experiments also demonstrate that Mint is lightweight enough for production use.
1.6. Original Source Link
/files/papers/6901a5ae84ecf5fffe471752/paper.pdf
This is a direct link to the PDF of the paper, indicating its publication status.
2. Executive Summary
2.1. Background & Motivation
Core Problem and Importance
The core problem addressed by this paper is the massive volume of distributed trace data generated by modern, complex microservice systems. While distributed tracing is crucial for providing visibility into system behavior, diagnosing failures, and profiling performance, the sheer volume of data makes its collection, storage, and processing extremely expensive, especially in production environments. For instance, the paper notes that a large-scale e-commerce system at Alibaba generates approximately 18.6-20.5 pebibytes (PBs) of traces per day, leading to substantial storage and network overheads. This creates a critical challenge in designing tracing frameworks: how to effectively balance the tradeoff between preserving essential trace information and reducing trace volume to an acceptable cost.
Challenges and Gaps in Prior Research
Previous approaches to trace reduction primarily relied on a '1 or 0' sampling strategy. This strategy involves retaining a portion of traces (sampled traces) with their full information, while completely discarding the rest (unsampled traces). The paper identifies two significant limitations of this prevailing strategy based on an empirical study on real-world production traces:
- Loss of Essential Information from Discarded Traces: The '1 or 0' strategy inevitably leads to the complete loss of unsampled traces. However, the characteristics of traces needed for analysis are often unpredictable and cannot be determined at the time of sampling. The empirical study found a significant query miss rate of approximately 27.17% for SREs (Site Reliability Engineers) seeking information from discarded traces, highlighting that valuable information is being lost. This impedes diagnosis and troubleshooting.
- Lack of Effective Compression for Individual Trace Volumes: Existing sampling methods only reduce the number of traces, but they preserve the full original data for sampled traces. Each trace can still be very voluminous (e.g., several MBs), containing detailed information. General-purpose compression tools or log compression techniques are ineffective for trace data due to its unique topological structure, failing to leverage trace-specific characteristics for efficient compression.
Paper's Entry Point / Innovative Idea
The paper's innovative idea is to shift the trace overhead reduction strategy from the '1 or 0' paradigm to a 'commonality + variability' paradigm. This paradigm recognizes that trace data, despite its volume, contains widely existing common patterns (commonality) and specific differentiating details (variability). By parsing traces into these two components, the system can:
-
Cost-efficiently store the basic information (patterns) for all traces.
-
Selectively filter and store the detailed information (parameters) for valuable traces.
This approach aims to provide a more balanced solution to the trace volume-information preservation tradeoff, allowing for the capture of all requests while significantly reducing overhead.
2.2. Main Contributions / Findings
Primary Contributions
The paper makes the following primary contributions:
- Empirical Study on Real-world Traces: Conducts an in-depth empirical study on production traces from Alibaba, revealing critical observations that highlight the limitations of existing sampling methods and the widespread existence of commonality and variability in trace data.
- Introduction of 'Commonality + Variability' Paradigm: Proposes a novel paradigm for trace reduction that moves beyond the '1 or 0' sampling. This paradigm focuses on separating trace data into common patterns and variable parameters, enabling more nuanced and cost-effective information retention.
- Proposal of Mint Framework: Designs and implements
Mint, a cost-efficient distributed tracing framework.Mintapplies the 'commonality + variability' paradigm on the agent side, allowing for the capture of all requests. It achieves this by aggregating common patterns and selectively filtering variable parameters. - Extensive Evaluation and Practical Deployment: Conducts extensive experiments to validate
Mint's effectiveness and performance, demonstrating its ability to reduce trace volume while capturing all requests.Minthas also been successfully deployed in a production environment at Alibaba, proving its practicality and lightweight nature.
Key Conclusions and Findings
The key conclusions and findings reached by the paper are:
-
The traditional '1 or 0' sampling strategy in distributed tracing leads to significant loss of potentially critical information (an average query miss rate of 27.17%) and fails to effectively compress individual trace volumes.
-
Commonality and variability are widely present in trace data at both inter-trace (34-56% commonality) and inter-span (25-45% commonality) levels, offering a strong basis for efficient data reduction.
-
Mint, by implementing the 'commonality + variability' paradigm, can capture information for all requests. This means that for any queried trace, at least approximate information (commonality) can be retrieved, addressing the problem of entirely discarded traces. -
Mintsignificantly optimizes trace storage, reducing it to an average of 2.7% of the original volume, and network overhead, reducing it to an average of 4.2%. -
Mintretains more valuable trace information, improving the accuracy of downstream root cause analysis (average A@1 increased from 25% to 50%). -
Mintis lightweight enough for production use, introducing an average CPU usage increase of only 0.86% and an average end-to-end request latency increase of 0.21%. -
Approximate traces, provided by
Mintfor unsampled requests, are highly beneficial for real-world use cases like trace exploration and batch trace analysis.These findings collectively demonstrate that
Minteffectively resolves the long-standing tradeoff in distributed tracing by enabling comprehensive data capture at a significantly reduced cost, while preserving essential information for analysis.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand the Mint framework, a reader needs to be familiar with several fundamental concepts in distributed systems, observability, and data structures.
Distributed Tracing
Distributed tracing is an observability technique used to monitor and troubleshoot complex transactions as they flow through multiple services in a distributed system, such as a microservices architecture. It provides an end-to-end view of a request's journey, making it possible to identify performance bottlenecks, diagnose errors, and understand system behavior.
- Trace: A
tracerepresents the complete end-to-end execution path of a single request or transaction as it propagates through various services in a distributed system. It is composed of multiplespans. - Span: A
spanrepresents a single logical unit of work within a trace. It typically encapsulates an operation within a service, such as an RPC call, a database query, or a function execution. Each span has a name, a start time, an end time, and attributes (key-value pairs) that provide contextual information. - Trace ID: A
trace IDis a unique identifier assigned to an entire trace. All spans belonging to the same trace share thistrace ID, allowing them to be correlated and reconstructed into a complete transaction flow. - Parent ID / Span ID: Each
spanhas a uniquespan ID. To establish the hierarchical relationship within a trace (i.e., which operation invoked which subsequent operation), a childspantypically includes thespan IDof its direct parentspanas itsparent ID. This forms the tree-like structure of a trace.
Microservices Architecture
A microservices architecture is an architectural style that structures an application as a collection of loosely coupled, independently deployable services. While offering benefits like scalability and flexibility, it also introduces significant complexity in terms of inter-service communication and debugging, making distributed tracing indispensable.
Sampling Strategies in Tracing
Sampling is a common technique in distributed tracing to reduce the volume of data by only retaining a subset of traces. This helps manage costs and overhead. The paper discusses a prevalent "1 or 0 sampling strategy," which means a trace is either fully retained (1) or completely discarded (0).
- Head Sampling: The sampling decision is made at the very beginning of a trace's lifecycle (at the "head" of the request). If a trace is selected for sampling, all its subsequent spans are collected. If it's not selected, no spans are collected for that trace. This is cost-effective as it prevents data generation for unsampled traces, but it can miss valuable traces that become anomalous later.
- Tail Sampling: The sampling decision is made after a trace has completed, typically at a centralized collector or backend. The entire trace is initially collected, and then a sampling policy (e.g., based on errors, latency, or specific attributes) decides whether to retain or discard it. This allows for more intelligent sampling based on the trace's full characteristics but incurs high network and processing overhead before the decision is made.
- Retroactive Sampling: A more advanced form of sampling where a small amount of "breadcrumb" data (minimal information about a trace) is collected for all traces. If an anomaly is detected or a trace becomes interesting later, these breadcrumbs can be used to "retroactively" retrieve or reconstruct more detailed information for that trace, possibly from agents that temporarily buffered the full trace. This aims to combine the benefits of head and tail sampling.
Bloom Filter
A Bloom Filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set.
- How it works: It consists of a bit array and a set of hash functions. To add an element, it's hashed by multiple hash functions, and the bits at the resulting positions in the array are set to 1. To check if an element is present, it's hashed again, and if all corresponding bits are 1, the element might be in the set. If any bit is 0, the element is definitely not in the set.
- Properties:
- Space-efficient: Requires significantly less memory than storing the actual elements.
- Probabilistic: Can produce
false positives(it might say an element is in the set when it's not), but neverfalse negatives(it will never say an element is not in the set when it actually is). - No deletion: Elements cannot be reliably removed from a standard Bloom Filter without potentially introducing false negatives for other elements.
- Use in Mint:
Mintuses Bloom Filters to efficiently storetrace metadata(liketrace IDs) associated withsub-trace patterns. This allows quick checks during queries to see if a particulartrace IDbelongs to a given pattern without storing alltrace IDs explicitly, thus saving storage. The "never miss" property is crucial here for trace coherence.
Commonality and Variability
These are fundamental concepts in data analysis that Mint leverages:
- Commonality: Refers to the shared, repetitive, or patterned aspects across multiple data instances. In traces, this could be the typical sequence of service calls for a specific type of request (topology) or the fixed structure of certain log messages within a span.
- Variability: Refers to the unique, dynamic, or differentiating parameters and values that change between data instances, even if they share a common pattern. In traces, this could be specific
HTTPparameters, user IDs, error codes, SQL query parameters, or latency values.Mint's core idea is to separate and treat these two aspects differently for efficient storage and analysis.
Regular Expressions (Regex)
Regular expressions are sequences of characters that define a search pattern. They are widely used for pattern matching within strings, for example, to find specific substrings, validate input formats, or extract parts of a text. In Mint, regex is used to define patterns for string attributes within spans and to extract the variable parameters that deviate from these patterns.
Longest Common Subsequence (LCS)
The Longest Common Subsequence (LCS) problem involves finding the longest sequence that is a subsequence of two or more sequences. A subsequence is a sequence that can be derived from another sequence by deleting some or no elements without changing the order of the remaining elements.
- Use in Mint:
MintusesLCSto calculate the similarity betweenstringvalues ofspanattributes during the offline pattern extraction stage. By finding theLCSbetween two tokenizedstringvalues, it can quantify how similar their underlying structure or common content is.
3.2. Previous Works
The paper contextualizes Mint by discussing existing distributed tracing frameworks and trace reduction techniques, as well as log compression methods.
Distributed Tracing Frameworks
- Classic Frameworks:
Magpie[6],X-trace[13],Dapper[52],Pinpoint[41], andPivot[36] are highlighted as foundational works that introduced end-to-end observability.Dapperfrom Google, in particular, is often cited as the pioneering work that popularized the distributed tracing concept. - Popular Open-Source Frameworks:
Jaeger[24],OpenTelemetry[43], andZipkin[1] are mentioned as widely adopted frameworks in industry.OpenTelemetryis noted for its unified API, SDK, and theOTLPstandard for trace data format and transmission.
Trace Sampling Methods
These methods primarily focus on the "1 or 0 sampling strategy" for trace reduction:
-
Head Sampling: [25, 52] Decisions made at the start, economical but can miss later anomalies.
-
Tail Sampling: [17, 23, 28, 29] Decisions made at the end, intelligent but higher overhead.
-
Retroactive Sampling: [60] (e.g.,
Hindsight) Collects minimal "breadcrumbs" for all traces, allowing for later retrieval of detailed information if a trace becomes interesting. This aims to address some limitations of head/tail sampling but still involves a form of sampling for full detail.The paper argues that despite these advancements, all these methods, being based on the "
1 or 0" paradigm, face the fundamental limitation of either completely discarding valuable unsampled traces or failing to compress the volume of individual traces effectively.
Log-specific Compressors
The paper acknowledges the existence of log-specific compressors as log data is a "cousin" of distributed traces:
-
LogArchive[8],Cowic[31],MLC[12]: Compress logs by extracting features. -
LogZip[34],RoughLogs[37],CLP[50],LogGrep[54]: Use log parsing to separate headers/schemas and variables for independent processing and compression.CLPspecifically parses logs into schemas and stores variables as dictionary/non-dictionary.However, the paper explicitly states that these
log compression methodsareineffective for trace compression. This is becausedistributed traceshave a distincttopological data structure(the tree-like structure of spans and their relationships) thatlog datalacks. Direct application oflog compressionmethods would not fully utilizetrace characteristics, leading to poor compression performance. This highlights the need fortrace-specific compressionas proposed byMint.
3.3. Technological Evolution
The evolution of tracing technologies can be seen as a progression from basic end-to-end visibility to more sophisticated cost-management and information retention strategies:
-
Early Tracing Frameworks (e.g., Dapper, X-Trace): Focused on establishing the fundamental concepts of distributed tracing (traces, spans, IDs) to provide visibility into complex systems. The initial goal was simply to make tracing possible.
-
Popularization and Standardization (e.g., Jaeger, Zipkin, OpenTelemetry): Focused on making tracing more accessible, interoperable, and performant, with efforts towards unified APIs and data formats. As adoption grew, the data volume problem became more apparent.
-
Trace Sampling (e.g., Head, Tail, Retroactive): Introduced to address the cost and volume challenges. These methods aimed to reduce the number of traces collected while trying to preserve "interesting" ones. However, they operated under the "all or nothing" principle for individual traces.
-
Mint's Approach ('Commonality + Variability'): Represents a further evolution by recognizing the limitations of simple sampling.Mintmoves beyond merely reducing the count of traces to intelligently compressing and differentiating within traces. It aims to capture all requests (at least approximately) and to optimize the size of individual traces by separating patterns from parameters. This allows for a more granular control over the tradeoff between cost and information completeness.Mintfits within this timeline as a next-generation approach that aims to overcome the shortcomings of previous sampling-only strategies by providing a more nuanced and cost-effective way to achieve near-full observability in large-scale systems.
3.4. Differentiation Analysis
Compared to the main methods in related work, Mint introduces several core differences and innovations:
-
Paradigm Shift from '1 or 0' Sampling:
- Traditional Sampling (e.g., OT-Head, OT-Tail, Sieve): These methods make a binary decision for each trace: either collect its full, raw data or discard it entirely. This leads to information loss for unsampled traces and no compression for individual sampled traces.
Mint's 'Commonality + Variability' Paradigm:Mintdoes not discard any trace entirely. Instead, it extractscommon patternsfor all traces andvariable parametersseparately. This means that at least approximate information (thecommonalitypart) for every request is retained, addressing the critical problem of query miss rates. For sampled traces,Mintstill collects thevariable parametersbut compresses the overall structure.
-
Cost-Effective Retention of All Requests:
- Traditional Sampling: Can only guarantee full information for a small, sampled subset of requests.
Mint: Provides basic (commonality) information for all requests at a very low cost (patterns are aggregated, and metadata stored efficiently via Bloom Filters). This enables users to query and retrieve information for every trace, even if only in an approximate form for unsampled ones.
-
Lightweighting of Individual Trace Volumes:
- Traditional Sampling: Even for sampled traces, the full raw data is preserved, meaning no reduction in the volume of each trace.
Mint: Applies trace-specific compression by parsingspansandsub-tracesinto patterns and parameters. This significantly reduces the size of individual traces, even those that are fully sampled, by storing patterns once and only the variable parts for each instance. This is a key differentiator from log compression techniques as well, which do not account for the topological structure of traces.
-
Agent-Side Reduction:
-
Tail Sampling (e.g., OT-Tail, Sieve): Requires collecting full traces at the agent and sending them to a backend collector/sampler, incurring high network overhead before reduction.
-
Mint: Performs parsing and initial reduction (pattern extraction, parameter buffering) directly on the agent side. This immediately saves both network bandwidth (by sending only compressed patterns/metadata or sampled parameters) and storage space.Hindsightalso does agent-side biased sampling but still relies on breadcrumbs and full trace retrieval for details, whereasMint's approach fundamentally changes how trace data is represented.In essence,
Mintinnovates by changing how trace data is represented and handled, rather than just which traces are kept. It offers a more granular, dual-layered approach to trace reduction that directly tackles the limitations of previous methods by preserving broader observability while maintaining cost efficiency.
-
4. Methodology
4.1. Principles
The core idea behind Mint's methodology is the 'commonality + variability' paradigm. This paradigm is grounded in the empirical observation that distributed trace data, despite its massive volume, exhibits significant redundancy and regularity. Many traces share common execution paths, service invocation sequences, and structural elements within their spans (commonality), while only specific values or parameters differ (variability).
The theoretical basis and intuition are as follows:
-
Commonality allows for aggregation: If many traces or
spansfollow the same pattern, this pattern needs to be stored only once. Subsequent instances can then simply reference this common pattern, dramatically reducing storage. This forms the "basic information" for all traces. -
Variability allows for selective retention: The unique, differentiating parts (parameters) are often the most valuable for debugging specific issues. By separating these parameters,
Mintcan apply intelligent filtering and only store thevariable parametersfor traces deemed "important" (e.g., sampled due to anomalies or rare paths), while still linking them back to their common patterns.This dual approach enables
Mintto:
- Retain a cost-efficient, approximate representation (common patterns) for all requests, ensuring no request is completely discarded.
- Provide full, detailed information (common patterns + variable parameters) for a selectively sampled subset of "important" requests, but in a highly compressed format.
- Reduce both network and storage overhead by performing this parsing and aggregation at the
agent side.
4.2. Core Methodology In-depth (Layer by Layer)
Mint's workflow is a multi-stage process, primarily executed on the agent side, to implement the commonality + variability paradigm. Let's break down the tracing walkthrough as depicted in Figure 5.
4.2.1. Overview of Mint's Tracing Walkthrough
The following figure (Figure 5 from the original paper) shows an overview of Mint's tracing walkthrough:
Figure 5. An overview of Mint's tracing walkthrough.
The process unfolds as follows:
- Trace Data Generating: When a request arrives at an application node,
Mint's client API (compatible with existing standards likeOpenTelemetry) generatesspandata. Unlike traditional frameworks that might immediately record or report thesespans,Mintredirects them to itsSpan Parser. - Inter-Span Level Parsing: The
Span Parseranalyzes individualspansto identifycommonalityandvariability. It decomposes each incomingspaninto apattern(the common part) andparameters(the variable part). Thepatternis then encoded into a compactpattern ID. Thispattern(or itsID) updates theSpan Pattern Library, while theparametersare temporarily stored in aParams Bufferon the agent. - Inter-Trace Level Parsing: On an application node, a single request can generate multiple
spanswhich, linked by theirparent IDs, form asub-trace(a segment of the full trace on that specific node). TheTrace Parserthen analyzes thetopologyof thissub-traceto extract asub-trace pattern. It searches for the most similarpatternin theTopology Pattern Library. Once amatched template(pattern) is found or a new one is created, essentialmetadataof the incomingsub-trace(e.g.,trace ID) is associated with thismatched templateusing aBloom Filter. ThisBloom Filterand theTopology Pattern Libraryare also stored on the agent. - Basic Information Uploading: Periodically, the
Mint agentuploads thePattern Library(containing bothspan patternsandsub-trace patterns) and theBloom Filtersto theMint backend. This ensures that the basic, common information (thecommonalitypart) of all traces is preserved in the backend at a low cost. - Key Traces Sampling:
Mintemploys two specialized samplers:- Symptom Sampler: Monitors the
Params Bufferforspanswith abnormalparameters(e.g.,HTTPstatus code 502, unusually high latency values) and marks their corresponding traces as "sampled." - Edge-Case Sampler: Monitors the
Pattern Libraryforsub-trace patternsthat represent rare or infrequent execution paths, marking these traces as "sampled."
- Symptom Sampler: Monitors the
- Parameters Uploading: If a trace is marked as "sampled" by either sampler (or external sampling rules), all its
variable parameters(thevariabilitypart) that are distributed across differentMint agentsare then emitted to theMint backend. This ensures that the full, detailed information for these important traces can be reconstructed.
4.2.2. Inter-Span Level Parsing
This stage focuses on breaking down individual spans into reusable patterns and extractable parameters.
3.2.1 Offline Stage: Warming up Span Parser
The Span Parser is warmed up offline by analyzing a random sample of raw spans. This pre-processing helps to build an initial set of patterns for efficient online parsing. The core idea is to train a dedicated parser for each attribute within a span and then combine these attribute patterns to form a complete span pattern.
The following figure (Figure 6 from the original paper) shows the offline construction process of the span parser:
Figure 6. The offline stage of span parser.
-
Clustering and Pattern Extracting:
- For attributes with string values: The
LCS (Longest Common Subsequence)method is used to determine the similarity betweenstringvalues. This helps group similar strings into clusters, from which aregular expression(regex) pattern can be extracted. The similarity between two tokenized string values and is calculated as: $ \delta ( s _ { 1 } , s _ { 2 } ) = \frac { | L C S ( s _ { 1 } , s _ { 2 } ) | } { \operatorname* { m a x } ( | s _ { 1 } | , | s _ { 2 } | ) } $ Where:- and are the two
tokenized strings(e.g., words are tokens). - represents the length (number of tokens) of the
Longest Common Subsequencebetween and . - is the length (number of tokens) of the longer of the two strings.
- A similarity threshold (e.g., 0.8) is used to group strings into
clusters(). For each cluster , the shortestregular expressionthat can represent all strings in that cluster is extracted as thepattern.
- and are the two
- For attributes with numeric values: A
bucketing approachbased onexponential intervalsis used. This groups numeric values into predefined ranges. For a numeric value , its bucket index is determined by: $ i = { \Big \lceil } \log _ { \gamma } ( d ) { \Big \rceil } $ Where:- is the numeric value.
- is a
precision parameter(e.g., 0.5). - .
- Values in bucket fall within the interval . Specifically, values in bucket fall within
(0, 1]. Each bucket is represented by aninterval pattern.
- For attributes with string values: The
-
Parsers Building:
- For
numeric attributes, the parser is the fixed mapping formula . - For
string attributes, aprefix tree(or Trie) is used to store all extractedregular expression patterns. This allows for efficient storage and matching, as common prefixes among patterns are shared.
- For
-
Patterns Combination:
Mintcombines these individualattribute patternsthat frequently appear together to form a completespan pattern. For example, if attribute consistently shows pattern when attribute shows pattern , then becomes aspan pattern. A uniquepattern ID(e.g., a UUID) is assigned to eachspan pattern, and these are stored in theSpan Pattern Library.
3.2.2 Online Stage: Matching and Parsing
Once Mint is deployed, it performs online parsing on newly generated raw spans.
The following figure (Figure 7 from the original paper) shows the online stage of span parser:
Figure 7. The online stage of span parser.
-
Hierarchical Attribute Parsing (HAP): This is the core of the online process.
- For each
attributeof an incomingspan,Mintprocesses it in parallel:- It uses the corresponding attribute parser (built in the offline stage) to find the
matching pattern(e.g., traversing the prefix tree for strings or applying the mapping formula for numerics). - The
matched patternis identified as thecommon part. - The
variable partis then extracted: forstring attributes, it's done usingregular expressions; fornumeric attributes, it's the difference from the interval's lower bound.
- It uses the corresponding attribute parser (built in the offline stage) to find the
- If a completely new
span patternis encountered (e.g., due to system changes), the relevantattribute parseris updated to include this new pattern.HAPis designed to be highly parallel to meet low-latency requirements.
- For each
-
Span Pattern Mapping: After parsing all attributes, the resulting
attribute patternsare combined.Mintthen attempts to match this combinedspan patternagainst those stored in theSpan Pattern Library.-
If an exact match is found, the existing
pattern IDis returned. -
If no match is found, the new
span patternis added to thePattern Library, and a newpattern IDis assigned. This allows thePattern Libraryto adapt to evolving system behavior.At the end of this stage,
raw spansare effectively transformed intopattern IDs(representing the commonality) andvariable parameters(representing the variability), withpatternsbeing aggregated in thePattern Libraryandparametersbuffered for later processing.
-
4.2.3. Inter-Trace Level Parsing
This stage organizes the parsed spans into sub-traces and extracts their topology patterns.
-
Sub-Trace Construction: Since a
Mint agentoperates on a single application node, it constructssub-tracesby linkingspansthat share the sametrace IDand are generated on that node. The linking is done based on theirparent IDs(as illustrated in Figure 4 in the paper, which shows a trace's tree-like structure). This local view prevents high-latency cross-node interactions. -
Pattern Extracting: For each incoming
sub-trace,Mintencodes itstopology information(the sequence and hierarchy ofspans) into avector. Each element in this vector represents aparent-child relationship, and crucially, each element is aspan pattern ID(from theInter-Span Level Parsing). This means thesub-trace patterncaptures both thetopologyand the aggregated content (span patterns) of thespanswithin it.The following figure (Figure 8 from the original paper) illustrates how Mint uses a sub-trace pattern:
Figure 8. Mint uses a sub-trace pattern to store the topology information of a sub-trace. It also uses a Bloom Filter to efficiently store the trace metadata for each sub-trace pattern. -
Matching or Updating: The
sub-trace patternis then checked against theTopo Pattern Library(a library oftopology patterns).- If an exact match is found, that existing
patternis used as thematched pattern. - If no match, the new
sub-trace patternis added to theTopo Pattern Library. This aggregation ensures that thetopology informationfor similarsub-tracesis stored only once.
- If an exact match is found, that existing
-
Metadata Mounting: To allow users to query traces efficiently using
trace metadata(e.g.,trace ID),Mintattaches aBloom Filterto eachsub-trace pattern.- The
Bloom Filterstores themetadata(specifically,trace IDs) of allsub-tracesthat conform to that particularsub-trace pattern. - As a probabilistic data structure, the
Bloom Filteris highly space-efficient for set membership testing. It can reportfalse positives(indicating atrace IDis present when it's not) butnever false negatives(it will never fail to report a presenttrace ID). This "no-miss" property is critical for ensuringtrace coherence(all segments of a trace can be found).False positivescan be mitigated by cross-agent verification during query time.
- The
4.2.4. Data Reporting
After the two levels of parsing, the Mint agent has three types of data:
-
Patterns:
Span patternsandsub-trace patternsstored in thePattern Library. -
Metadata:
Trace metadata(viaBloom Filters) mounted onsub-trace patterns. -
Parameters:
Variable parameterstemporarily stored in theParam Buffer.Mint's reporting strategy differentiates betweencommonalityandvariabilityparts: -
Uploading Basic Information: The
Mint agentperiodically uploads the entirePattern LibraryandBloom Filtersto thebackend. This guarantees that the aggregatedcommonalityinformation for all traces is preserved in the backend, supporting the "capture all requests" goal. The cost is low because millions of traces typically map to only hundreds of patterns. -
Uploading Parameters: For the
variable parametersin theParam Buffer,Mintdecides whether to send them to thebackendbased on whether the trace they belong to has beenmarked as sampled. If a trace is sampled, all itsvariable parameters(potentially across multiple agents) are sent to thebackend, enabling full reconstruction of the sampled trace.
Sampling Rules
Mint is flexible and can work with existing sampling rules (e.g., head sampling or tail sampling). Additionally, it provides two specialized samplers tailored for the commonality + variability paradigm:
-
Symptom Sampler:
- Purpose: To identify and sample
symptomatic traces– those exhibiting anomalous behavior. - Mechanism: It continuously monitors the
variable parametersstored in theParam Buffer. - Sampling Criteria:
- For
numerical parameters: Samplesoutliers(e.g., values exceeding the 95th percentile, P95). - For
string parameters: Samples values containing user-definedabnormal words.
- For
- This ensures that traces linked to potential issues are fully captured.
- Purpose: To identify and sample
-
Edge-Case Sampler:
-
Purpose: To identify and sample traces that follow
rare execution paths. -
Mechanism: It monitors the
topology patternswithin thePattern Library. -
Sampling Criteria: It tracks the frequency of
tracesmatching eachtopology patternand prioritizes samplingtracesassociated with less common patterns. For example, ifpattern Ais seen 99% of the time andpattern Bonly 1%,pattern Btraces will have a higher sampling probability. -
This helps ensure diversity in the collected full traces, capturing unusual but potentially important system behaviors.
When a trace is marked as sampled by any rule,
Mintcoordinates across agents (via the backend) to ensure allparametersfor that specifictrace IDare collected, maintainingtrace coherence.
-
5. Experimental Setup
5.1. Datasets
The experiments used a combination of open-source microservice benchmarks and real-world production systems from Alibaba.
- OnlineBoutique (OB) [18]: A web-based e-commerce application. It consists of 10 microservices, implemented in various programming languages, communicating via
gRPC. - TrainTicket (TT) [14]: A railway ticketing service. It involves 45 services that communicate through synchronous
RESTinvocations and asynchronous messaging.- Deployment: Both
OnlineBoutiqueandTrainTicketwere deployed on aKubernetesplatform using 12virtual machines (VMs). EachVMhad an 8-core 2.10GHzCPU, 16GB ofmemory, and ran onUbuntu 18.04.
- Deployment: Both
- Alibaba Production Microservice System: A real-world production environment at Alibaba. This system includes typical components like
web services,MongoDB[39] access, andMySQL[40] access. This dataset allows for evaluatingMint's performance and practicality in a high-scale, real-world scenario. - Alibaba Cloud Subsystems: For specific evaluations like
compression ratioandpattern extraction performance,Mintused raw trace data generated by:- 6 subsystems from Alibaba (with varying
API countsandcall depths) to evaluate compression ratio. - 5 sub-services in Alibaba Cloud (over an hour) to test pattern extraction capabilities.
- 6 subsystems from Alibaba (with varying
Data Sample Example
The paper provides illustrations of trace and span structures. The following figure (Figure 4 from the original paper) shows an example of trace and span structure. This helps to visualize the hierarchical nature of trace data.
Figure 4. An example of trace and span structure.
As shown in Figure 4, a trace (e.g., trace ID ab8d...) is composed of multiple spans (e.g., span ID b1e6). Each span represents a unit of work (e.g., product service/get_product) and has:
-
Topology part:
Span ID,Parent ID,Trace ID(defines position in the trace tree). -
Metadata part:
Service name,Operation name,Start time,End time. -
Attributes part: Key-value pairs providing detailed context (e.g.,
attributes.sql,attributes.http.method,event: sql query).Another example of a data sample is an approximate trace output. The following figure (Figure 10 from the original paper) shows an example of querying an unsampled trace to get an approximate trace:
Figure 10. An example of querying an unsampled trace to get an approximate trace, variables are masked by ^ \\bullet < ^ { \\ast } > ^ { \\bullet } and numbers are bucket-mapped.
This approximate trace illustrates how Mint presents unsampled data: variables (e.g., specific city_id, rb_id) are masked or generalized, and numbers (e.g., latency values) are represented by their bucket ranges rather than exact figures.
Dataset Selection Rationale
The chosen datasets are effective for validating Mint's performance because:
- Microservice Benchmarks (OB, TT): Provide controlled, reproducible environments for comparing
Mintagainst baselines under varying loads, and for injecting faults to testRCAeffectiveness. They are widely used in trace analysis studies, allowing for comparison with prior work. - Alibaba Production Systems: Offer real-world scale, complexity, and traffic patterns, ensuring that
Mint's claims of cost-efficiency and lightweight operation are validated in a demanding industrial setting, not just idealized lab conditions. The diversity of subsystems also helps test the robustness ofMint's pattern extraction across different application types.
5.2. Evaluation Metrics
For each evaluation metric, the conceptual definition, mathematical formula (if applicable), and symbol explanation are provided.
-
Storage Overhead:
- Conceptual Definition: The total disk space consumed by the trace data after processing by a tracing framework. It quantifies the cost associated with persisting trace information. A lower storage overhead is desirable.
- Mathematical Formula: Not explicitly provided by the paper, but typically calculated as the total size in bytes (or GB/PB) of the stored data.
- Symbol Explanation: N/A.
-
Network Overhead:
- Conceptual Definition: The amount of data transmitted over the network for tracing purposes, specifically between application nodes and the tracing backend. It quantifies the bandwidth consumed and its potential impact on network latency for business traffic. A lower network overhead is desirable.
- Mathematical Formula: Not explicitly provided by the paper, but typically measured as data volume per unit time (e.g., MB/min).
- Symbol Explanation: N/A.
-
Miss Rate (for Trace Queries):
- Conceptual Definition: The proportion of user queries for specific trace IDs that yield no results because the traces were discarded by the sampling strategy. It directly measures the effectiveness of information retention from a user's perspective. A lower miss rate is desirable.
- Mathematical Formula: Not explicitly provided by the paper, but can be generally defined as: $ \text{Miss Rate} = \frac{\text{Number of queries with no results}}{\text{Total number of user queries}} $
- Symbol Explanation:
Number of queries with no resultsrefers to the count of instances where a user tried to retrieve a trace but the tracing system had no record of it.Total number of user queriesis the total number of attempts by users to retrieve trace information.
-
Query Response Ability (Exact Hit, Partial Hit, Miss):
- Conceptual Definition: Categorizes the outcome of a user's trace query.
Exact Hit: The tracing framework returns the complete, detailed information of the queried trace.Partial Hit: The framework returns approximate information (e.g., common patterns, but masked variables) for the queried trace.Miss: No information for the queried trace is returned.
- Mathematical Formula: Not a single formula, but counts for each category.
- Symbol Explanation: N/A.
- Conceptual Definition: Categorizes the outcome of a user's trace query.
-
Top-1 Accuracy (A@1) for Root Cause Analysis (RCA):
- Conceptual Definition: A metric used to evaluate the effectiveness of root cause analysis methods. It measures the percentage of times the actual fault (root cause) is correctly identified as the single most likely cause (top-1 prediction) by the RCA method, given the trace data. A higher A@1 indicates better diagnostic performance.
- Mathematical Formula: $ A@1 = \frac{\text{Number of correct top-1 root cause predictions}}{\text{Total number of fault injections}} $
- Symbol Explanation:
- : The count of fault injection scenarios where the RCA method's highest-ranked (top-1) suggestion matches the actual injected fault.
- : The total number of distinct faults intentionally introduced into the system during experiments.
-
Compression Ratio:
- Conceptual Definition: A measure of how much the size of data is reduced after compression. It is the ratio of the original data size to the compressed data size. A higher compression ratio means more effective compression.
- Mathematical Formula: $ \text{Compression Ratio} = \frac{\text{Original Data Size}}{\text{Compressed Data Size}} $
- Symbol Explanation:
- : The size of the trace data before any compression or processing by
Mint(raw format). - : The size of the trace data after being processed and reduced by
Mint(or other compression tools).
- : The size of the trace data before any compression or processing by
-
CPU Usage Increment:
- Conceptual Definition: The additional percentage increase in CPU utilization of application nodes caused by the tracing framework. It quantifies the computational overhead imposed on the monitored applications. A lower increment implies a more lightweight tracing solution.
- Mathematical Formula: Not explicitly provided, typically calculated as
(CPU usage with tracing - CPU usage without tracing) / CPU usage without tracing * 100%. - Symbol Explanation: N/A.
-
End-to-End Request Latency:
- Conceptual Definition: The total time taken for a user request to complete, from the moment it is initiated until the response is received. The tracing framework might add overhead that increases this latency. A smaller increase is better.
- Mathematical Formula: Not explicitly provided. Measured in milliseconds or seconds.
- Symbol Explanation: N/A.
-
Query Latency:
- Conceptual Definition: The time taken for the tracing system's backend to retrieve and return requested trace information to a user. This measures the responsiveness of the querying interface. Lower latency is better.
- Mathematical Formula: Not explicitly provided. Measured in milliseconds or seconds.
- Symbol Explanation: N/A.
5.3. Baselines
Mint was compared against a selection of widely used tracing frameworks and novel methods from recent years:
-
OpenTelemetry under head-sampling (OT-Head) [48]: Represents a common, basic sampling strategy where a decision is made at the start of the trace. The sampling rate was set to
5%in the experiments.OpenTelemetryagents were instrumented, and data was collected byOpenTelemetry Collector, then stored inGrafana TempoandElasticsearch[11]. -
OpenTelemetry under tail-sampling (OT-Tail) [44]: Represents an intelligent sampling strategy where decisions are made at the end of the trace, allowing for policy-driven retention (e.g., keeping anomalous traces). To ensure effectiveness, traces with an
'is_abnormal'tag (injected for5%of traffic) were targeted for sampling. -
Hindsight [60]: A tracing framework implementing
retroactive sampling. It collects minimal "breadcrumbs" for all traces and allows for later biased sampling. It was configured as specified in its original paper, compatible withOpenTelemetry. -
Sieve [23]: An
online tail samplingapproach that usesrobust random cut forest (RRCF)to sample uncommon traces. It was implemented usingOpenTelemetryagents/collectors, with data redirected toSievefor filtering. -
OT-Full (OpenTelemetry with 100% sampling rate): Used as a reference point to represent the
no trace reductionscenario, providing the baseline for original volume.For evaluating
Mint's lossless compression ability (against log-specific compressors): -
LogZip [33]: A log compressor that extracts hidden structures via iterative clustering.
-
LogReducer [55]: Identifies and reduces log hotspots.
-
CLP [50]: Efficient and scalable search on compressed text logs, which parses logs into schemas.
5.4. Additional Experimental Details
- Fairness in Sampling: For comparisons involving sampling,
5%of the injected traffic in benchmarks was tagged with an'is_abnormal'label. This ensured that all biased sampling methods (OT-Tail, Sieve, Hindsight) had a consistent target for "valuable" traces.OT-HeadandMintwere also set to a comparable10%sampling rate in some end-to-end overhead tests. - Chaos Engineering: To simulate real-world microservices problem analysis and evaluate
RCAeffectiveness,chaos engineeringwas performed onOnlineBoutiqueandTrainTicketusingChaosblade[7]. A total of56 faultswere injected, including:CPU exhaustionMemory exhaustionNetwork delaysCode exceptionsError returns
- Downstream RCA Methods: The trace data captured by
Mintand baselines was then fed into three classic trace-basedRCAmethods:-
MicroRank[57] -
TraceRCA[30] -
TraceAnomaly[35]The results from these
RCAmethods were used to calculate thetop-1 accuracy (A@1), assessing the analytical value of the retained trace data.
-
6. Results & Analysis
6.1. Core Results Analysis
The experiments extensively demonstrate Mint's effectiveness in trace data reduction, information retention, and practical performance.
Effectiveness in Reducing Trace Data
The following figure (Figure 11 from the original paper) shows tracing network and storage overhead on OnlineBoutique and TrainTicket Benchmarks.
Figure 11. Tracing network and storage overhead on OnlineBoutique and TrainTicket Benchmarks.
Figure 11 clearly shows Mint's superior performance in reducing both network and storage overhead compared to all baseline methods.
-
OT-Full: Represents the scenario with no trace reduction (100% sampling). It has the highest network and storage overhead, serving as the reference maximum.
-
OT-Head: Reduces network and storage overheads proportionally to its sampling rate (e.g., 5% sampling would reduce overhead to roughly 5% of OT-Full). This is because it discards unsampled traces at the source.
-
OT-Tail & Sieve: While these methods effectively reduce storage overhead (to around the anomaly rate) because they filter traces at the backend, they fail to reduce network overhead. This is because all trace data must first be transmitted to the backend before the sampling decision is made, making their network overhead similar to OT-Full.
-
Hindsight: Performs biased sampling at the agent side, achieving reduction in both network and storage. However, it still requires transmitting "breadcrumbs" for all traces, leading to slightly higher network overhead than pure head sampling for the same effective data.
-
Mint: Achieves the most significant reductions. By processing and compressing traces on the agent side using its
commonality + variabilityparadigm,Mintdrastically lowers both network and storage overhead. On average,Mintreduces total trace storage overhead to 2.7% and network overhead to 4.2% of the original volume.Analysis:
Mint's advantage stems from its fundamental design:
- Agent-side Reduction: Like
OT-HeadandHindsight,Mintreduces data volume before transmission, saving network bandwidth. - Compression of Individual Traces: Unlike all other methods,
Mintdoesn't just reduce the number of traces; it significantly reduces the size of each trace (even sampled ones) by abstracting common patterns and only storing variable parameters. This is where it achieves further reductions beyond what simple sampling can offer.
Effectiveness in Retaining More Trace Information
The paper evaluates information retention through two aspects: specific query response ability and analytical value for downstream RCA.
Query Response Ability
The following figure (Figure 12 from the original paper) shows hit number for user queries in Alibaba during 14 days, demonstrating Mint can respond to all requests.
Figure 12. Hit number for user queries in Alibaba during 14 days, demonstrating Mint can respond to all requests.
Figure 12 demonstrates Mint's ability to respond to user queries for traces, especially those that would be completely missed by traditional sampling.
- Total (Red Dashed Line): Represents the total number of user queries per day.
- Baseline Methods (OT-Head, OT-Tail, Sieve, Hindsight): Show a significant number of "misses" (the gap between their hits and the 'Total' line). This confirms the empirical finding that traditional sampling leads to a substantial loss of queryable information.
- Mint:
-
When considering partial hits,
Mintresponds to all queries. This is a critical outcome of thecommonality + variabilityparadigm, asMintretains at least approximate information (patterns) for every single trace. -
Even when considering only exact hits (full trace information),
Mintstill outperforms baseline methods, responding to more queries than any other approach. This implies that its biased sampling and compression strategy effectively preserves more detailed information for critical traces.Analysis: This result directly addresses the
27.17% query miss rateproblem identified in the empirical study. By guaranteeing at least apartial hitfor every query,Mintsignificantly improves observability and reliability engineers' ability to investigate issues, even for unsampled traces.
-
Effectiveness for Downstream Analysis
The following table (Table 3 from the original paper) shows a comparison of the effects of different tracing frameworks in downstream root cause analysis's accuracy. The following are the results from Table 3 of the original paper:
| Benchmark | RCA Method | Tracing Framework | ||||
| OT-Head | OT-Tail | Sieve | Hindsight | Mint | ||
| OB | MicroRank | 0.1563 | 0.2188 | 0.2813 | 0.2188 | 0.6563 |
| TraceAnomaly | 0.2813 | 0.2500 | 0.3750 | 0.3438 | 0.7037 | |
| TraceRCA | 0.2500 | 0.2500 | 0.3438 | 0.2188 | 0.6563 | |
| TT | MicroRank | 0.0714 | 0.1429 | 0.1786 | 0.1786 | 0.5357 |
| TraceAnomaly | 0.1786 | 0.1786 | 0.2857 | 0.3214 | 0.5714 | |
| TraceRCA | 0.1429 | 0.1786 | 0.2500 | 0.1429 | 0.5000 | |
Table 3 shows the Top-1 Accuracy (A@1) for different combinations of RCA methods and tracing frameworks. Mint consistently and significantly outperforms all baseline methods across both benchmarks (OB and TT) and all three RCA methods.
-
Baselines:
OT-Head,OT-Tail,Sieve, andHindsightall yield relatively lowA@1values, often below38%. This is because theseRCA methods(likeMicroRank,TraceRCAfor spectrum analysis, andTraceAnomalyfor template comparison) heavily rely on having a sufficient number ofcommon-case tracesandnormal templatesfor effective analysis. The "1 or 0" sampling strategy of baselines discards many common traces, thereby weakening the input data for theseRCAtools. -
Mint: Achieves dramatically higher
A@1values, with an average increase from25%to50%compared to baselines. For example,MicroRankonOBjumps from0.1563(OT-Head) to0.6563(Mint).Analysis: This is a strong validation of
Mint's ability to retain analytically valuable information. By preserving essential information forall traces(viacommonality) and detailed information foredge cases(viavariabilityand targeted sampling),Mintprovides a much richer and more complete dataset forRCAalgorithms. This enables these algorithms to build more accurate models of normal behavior and better identify deviations, leading to improved diagnosis capabilities.
6.2. Contribution of Commonality and Variability Analysis
This section focuses on Mint's lossless compression capabilities, comparing it with log-specific compressors and evaluating the contribution of its inter-span and inter-trace parsing.
The following figure (Figure 13 from the original paper) shows basic information about six datasets from Alibaba and the distribution of APIs for different datasets.
Figure 13. Description of 6 datasets in Alibaba.
Figure 13 provides context for the datasets used to evaluate compression. These are diverse real-world subsystems from Alibaba, with varying numbers of traces, API counts, and average call depths. This diversity ensures that Mint's compression effectiveness is tested across different application complexities.
The following table (Table 4 from the original paper) shows a comparison in terms of Compression Ratio. The following are the results from Table 4 of the original paper:
| Dataset | LogZip | LogReducer | CLP | w/o Sp | w/o T p | Mint |
| A | 16.7989 | 19.9594 | 22.7130 | 21.2503 | 23.1391 | 45.1874 |
| B | 13.0634 | 10.2291 | 14.0553 | 14.3892 | 15.9906 | 41.0603 |
| C | 5.2411 | 7.8613 | 11.5995 | 14.3229 | 13.7895 | 22.7690 |
| D | 11.0920 | 11.4943 | 14.4578 | 10.2255 | 18.1101 | 36.6724 |
| E | 8.7774 | 9.0126 | 12.1723 | 10.1943 | 17.1917 | 32.0245 |
| F | 9.2336 | 10.6611 | 15.3990 | 8.9231 | 19.7713 | 29.7024 |
Table 4 compares the compression ratio of Mint against log-specific compressors and its own ablation variants.
- Log-specific Compressors (LogZip, LogReducer, CLP): These methods achieve moderate compression ratios (ranging from ~5x to ~22x). As expected, they are less effective for traces due to their lack of topological awareness.
- Mint: Consistently achieves significantly higher compression ratios across all datasets, ranging from
22.7690xto45.1874x. On average,Mintoutperforms the baselines by14.90to28.38times in compression ratio. This demonstrates thatMint's specific approach to leverage trace characteristics (topology and span structure) is far more effective. - Ablation Study (w/o Sp, w/o Tp):
-
w/o Sp(Mint without inter-span level parsing): This variant performs worse than the fullMint, indicating that breaking down individualspansintopatternsandparametersis crucial for compression. -
w/o Tp(Mint without inter-trace level parsing): This variant also performs worse than the fullMint, demonstrating the importance of aggregatingsub-trace topologiesfor overall reduction.Analysis: The results in Table 4 strongly validate that both
inter-span level parsingandinter-trace level parsingare critical components ofMint's compression strategy. By jointly analyzingcommonalityat these two levels,Mintcan achieve superior lossless compression ratios compared to methods that do not fully understand trace structures.
-
6.3. Mint Overhead and Scalability
This section evaluates Mint's practical overhead and scalability using a real-world Alibaba production microservice system.
End-to-End Tracing Overhead
The following figure (Figure 14 from the original paper) shows tracing overhead during 14 load tests on Alibaba's production microservices system.
Figure 14. Tracing overhead during 14 load tests on Alibaba's production microservices system.
Figure 14 presents Mint's operational overhead compared to no tracing and OT-Head (with 10% sampling).
- Egress Network Bandwidth (Figure 14b):
No-Tracing: Baseline.OT-Head: Increases bandwidth by19.35%.Mint: Increases bandwidth by only2.88%. This indicatesMint's significant effectiveness in reducing network traffic due to its agent-side parsing and selective parameter uploading.
- CPU Usage (Figure 14c):
No-Tracing: Baseline.OT-Head: Increases CPU usage by1.25%.Mint: Increases CPU usage by0.86%.Mint's computational overhead is acceptable and even lower thanOT-Head, suggesting its parsing mechanisms are efficient.
- Storage Overhead (Figure 14d):
No-Tracing: Baseline.OT-Head: Increases storage by1.7%.Mint: Increases storage by1.8%.Mint's storage overhead is comparable toOT-Head(which samples at 10%), demonstrating its efficiency while capturing all requests (albeit approximated) and retaining more information thanOT-Head.
Latency
The following figure (Figure 15 from the original paper) shows the comparison of end-to-end request latency and query latency in Alibaba's production microservices system.
Figure 15. End-to-End request latency and query latency on Alibaba's production microservices system.
Figure 15 analyzes the latency impact of Mint.
- End-to-End Request Latency (Figure 15a):
- Using
Mintincreased average request latency by a mere0.21%. This is a very low overhead, makingMintsuitable for latency-sensitive production environments.
- Using
- Query Latency (Figure 15b):
- Querying with
Minttook an average of4.2%longer than withOpenTelemetry. - The
P95query latency was below 1 second. This level of query latency is well within acceptable limits for most production environments.
- Querying with
Pattern Extraction Performance
The following table (Table 5 from the original paper) shows pattern extraction results of Span Parser and Trace Parser on 5 sub-services in Alibaba Cloud. The following are the results from Table 5 of the original paper:
| Sub-Service | Raw Trace Number | Span Level Pattern Number | Trace Level Pattern Number |
| S1 | 146,985 | 11 | 8 |
| S2 | 126,245 | 10 | 8 |
| S3 | 93,546 | 14 | 5 |
| S4 | 92,527 | 7 | 3 |
| S5 | 79,179 | 9 | 3 |
Table 5 illustrates the effectiveness of Mint's Span Parser and Trace Parser in aggregating patterns from large volumes of raw traces.
-
For
hundreds of thousands of raw traces(e.g., 146,985 for S1), theSpan Parserextracts a very small number ofspan-level patterns(e.g., 11 for S1). -
Similarly, the
Trace Parserextracts an even smaller number oftrace-level patterns(e.g., 8 for S1). -
The compression ratio from
raw log counttospan-level patternsranges from6,681to13,362, and totrace-level patternsfrom15,780to30,842.Analysis: This demonstrates the profound
commonalitypresent in real-world trace data andMint's ability to efficiently identify and aggregate it. The extremely low number of resulting patterns confirms whyMintcan store the commonality of all traces at such a minimal cost. This pattern aggregation is a cornerstone of its cost-efficiency.
6.4. Ablation Studies / Parameter Analysis
Parameter Sensitivity
The following figure (Figure 16 from the original paper) shows the total storage size of patterns and parameters with the similarity threshold at 0.2, 0.4, 0.6, and 0.8.
Figure 16. The total storage size of patterns and parameters with the similarity threshold at 0.2, 0.4, 0.6, and 0.8.
Figure 16 illustrates the impact of the similarity threshold (used in the Span Parser for clustering string values) on total storage size.
-
The graph shows that as the
similarity thresholdincreases (from0.2to0.8), thetotal storage sizeofpatternsandparametersdecreases. -
A higher
similarity thresholdmeans that only very similar strings are grouped into the same pattern. This leads to more distinct patterns but fewer parameters being extracted (as more content becomes part of a pattern). -
Conversely, a lower
similarity thresholdmeans more diverse strings are grouped into the same pattern, leading to fewer patterns but more content being treated as variableparameters. -
The paper states that the
default similarity thresholdis set to0.8after considering both total storage size and the effectiveness of parameter extraction (a very high threshold could reduce the meaningfulvariabilitythat can be extracted).Analysis: This sensitivity analysis highlights that parameter tuning, specifically for the
similarity threshold, is important for optimizingMint's performance. The chosen default (0.8) represents a balance, aiming to maximize storage reduction while ensuring that meaningfulvariability(parameters) can still be identified for detailed analysis.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper introduces Mint, a cost-efficient distributed tracing framework that fundamentally shifts the paradigm of trace reduction from traditional "1 or 0" sampling to a novel "commonality + variability" approach. By performing two-level parsing (inter-span and inter-trace) on the agent side, Mint effectively separates trace data into common patterns and variable parameters. It aggregates these patterns and uses Bloom Filters to efficiently store metadata, ensuring that basic information for all requests is captured and retrievable. For critical (sampled) traces, Mint filters and retains detailed variable parameters.
The key contributions of Mint are:
-
Comprehensive Coverage: It captures information for all requests, providing at least an approximate trace even for those not fully sampled, addressing the significant issue of discarded valuable traces in prior methods.
-
Exceptional Cost Efficiency: It drastically reduces trace storage overhead to an average of 2.7% and network overhead to an average of 4.2% compared to original volumes.
-
Enhanced Analytical Value: By retaining more analytically relevant information,
Mintsignificantly improves the accuracy of downstream root cause analysis methods, with an averageA@1increase from 25% to 50%. -
Production Readiness: Experiments on a large-scale Alibaba production system demonstrate
Mint's lightweight nature, with minimal impact onCPU usage (0.86% increase)andrequest latency (0.21% increase), making it practical for real-world deployment.The successful deployment in Alibaba and positive user feedback underscore its practical utility and significant improvement in observability and user experience.
7.2. Limitations & Future Work
While Mint presents a significant advancement, the paper implicitly and explicitly touches upon certain limitations:
-
Parameter Sensitivity: The performance of
Mintcan be influenced by internal parameters, such as thesimilarity thresholdused in theSpan Parser. While a default of0.8is provided, optimal values might vary across different application contexts, requiring careful tuning. An excessively high threshold might weakenparameter extraction effectivenessby making too much content part of the pattern, potentially losing valuablevariability. -
Bloom Filter False Positives: While
Bloom Filtersensure no false negatives (never missing a trace), they can producefalse positives. Although the paper states this can bealleviated through upstream-downstream verification across multiple agents, this mitigation adds complexity to the query process and implies a potential for slightly longer query times or additional computation. -
Approximate Nature of Unsampled Traces: While
Mintcaptures all requests, unsampled traces are stored in anapproximate form(patterns with masked variables or bucket-mapped numbers). While this is shown to be highly useful fortrace explorationandbatch analysis, it may not provide the granular, exact detail sometimes needed for very specific, deep-dive debugging scenarios where every single parameter value is crucial. The trade-off is a pragmatic one, but it is still a trade-off. -
Cold Start Issues: The
offline stagefor warming up theSpan Parseris necessary to achieve acceptable performance in the early stages of online parsing. This implies a potentialcold startperiod forMintin new or rapidly evolving systems before sufficient patterns are learned. -
Complexity of System Changes: When the system changes, previous patterns may become
outdated, requiring developers totrigger Mint's reconstruct interface to rebuild the patterns. This introduces a manual or semi-manual management overhead for adapting to significant system evolution.The paper doesn't explicitly outline future work, but based on these observations, potential future directions could include:
-
Developing adaptive or self-tuning mechanisms for parameters like the
similarity threshold. -
Exploring more robust and efficient ways to handle
Bloom Filter false positivesor alternative data structures with similar space efficiency but without this drawback. -
Investigating methods to quantify and improve the "quality" or "fidelity" of approximate traces for specific debugging tasks.
-
Automating the pattern reconstruction process in response to system changes to reduce manual intervention.
-
Extending the
commonality + variabilityparadigm to other observability signals (e.g., logs, metrics) for integrated cost efficiency.
7.3. Personal Insights & Critique
Mint represents a genuinely insightful and practical evolution in distributed tracing. The shift from a binary "1 or 0" sampling decision to the nuanced "commonality + variability" paradigm is its most significant innovation. It acknowledges the real-world dilemma of wanting full observability without incurring prohibitive costs.
Inspirations:
- Pragmatic Trade-off Management: The core idea of providing "approximate truth for all" rather than "exact truth for some and nothing for others" is highly inspiring. This approach resonates deeply with the needs of large-scale production systems where SREs often need some information about every request, even if not all details are present.
- Leveraging Data Structure: Explicitly leveraging the
topological data structureof traces, rather than treating them as flat logs, is crucial. This is whereMintgains a significant advantage over log-centric compression techniques and provides a blueprint for other observability data types that might have similar inherent structures. - Agent-Side Intelligence: Performing complex parsing and reduction on the
agent sideis a smart design choice. It directly impacts network overhead, which is often a major cost component in distributed systems.
Potential Issues, Unverified Assumptions, or Areas for Improvement:
-
Cognitive Load of Approximate Traces: While beneficial, relying on
approximate tracesmight introduce a new cognitive load for developers and SREs. They need to understand what information is approximated, what is filtered, and how to interpret masked values. The UI/UX for presenting these approximate traces (e.g., how Figure 10 is visualized) will be critical for user adoption and effectiveness. -
Parameter Tuning Complexity: As noted in the parameter sensitivity analysis, selecting the right
similarity thresholdis important. For a system with thousands of microservices and diverse workloads, finding a universally optimal threshold or dynamically adapting it could be challenging. Poorly tuned parameters might either lose too muchvariabilityor fail to compress effectively. -
Bloom Filter Collisions and Impact: While
Bloom Filtersprovide excellent space efficiency, the occurrence offalse positives(even if mitigated by verification) could potentially lead to slightly longer query times or unnecessary data retrieval attempts in the backend. The paper's claim of "never miss a trace" for Bloom Filters refers tofalse negatives, notfalse positives. Whilefalse positivesdon't mean a trace is missed, they can mean it's incorrectly identified as belonging to a pattern it doesn't, necessitating extra checks. -
Reconstruction of Outdated Patterns: The need to
reconstructpatterns when the system changes implies thatMint's effectiveness relies on a relatively stable system structure. In highly dynamic environments with frequent code deployments or A/B testing, thePattern Librarymight require frequent updates, potentially incurring overhead or temporary sub-optimal compression. -
Generalizability to Other Systems: While validated on Alibaba's scale, the specific types of
commonalityandvariabilitymight differ in other domains or organizational structures. The effectiveness ofMint's specific pattern extraction algorithms might need re-validation or adaptation for drastically different systems (e.g., scientific computing workflows vs. e-commerce).Overall,
Mintoffers a compelling vision for future distributed tracing, moving beyond simplistic sampling to intelligent, structure-aware data reduction that enhances observability without breaking the bank. It provides a strong foundation for future work in balancing the intricate tradeoffs of monitoring complex, large-scale systems.
Similar papers
Recommended via semantic vector search.