AICHRONO L ENS: Advancing Explainability for Time Series AI Forecasting in Mobile Networks
TL;DR Summary
AICHRONO L ENS integrates traditional XAI with temporal input features, enhancing interpretability of LSTM-based time series forecasting in mobile networks and uncovering error causes, improving prediction accuracy by 32%.
Abstract
AIC HRONO L ENS : Advancing Explainability for Time Series AI Forecasting in Mobile Networks Claudio Fiandrino, Eloy Pérez Gómez, Pablo Fernández Pérez, Hossein Mohammadalizadeh, Marco Fiore and Joerg Widmer IMDEA Networks Institute, Madrid, Spain Email: {name.surname}@imdea.org Abstract —Next-generation mobile networks will increasingly rely on the ability to forecast traffic patterns for resource management. Usually, this translates into forecasting diverse objectives like traffic load, bandwidth, or channel spectrum utilization, measured over time. Among the other techniques, Long-Short Term Memory ( LSTM ) proved very successful for this task. Unfortunately, the inherent complexity of these models makes them hard to interpret and, thus, hampers their deployment in production networks. To make the problem worsen, EXplainable Artificial Intelligence ( XAI ) techniques, which are primarily conceived for computer vision and natural language processing, fail to provide useful insights: they are blind to the temporal characteristics of the input and only work well with highly rich semantic data like images or text. In this paper, we take the research on XAI fo
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
AICHRONOLENS: Advancing Explainability for Time Series AI Forecasting in Mobile Networks
1.2. Authors
-
Claudio Fiandrino
-
Eloy Pérez Gómez
-
Pablo Fernández Pérez
-
Hossein Mohammadalizadeh
-
Marco Fiore
-
Joerg Widmer
All authors are affiliated with the IMDEA Networks Institute, Madrid, Spain. Their research backgrounds appear to be in telecommunications, mobile networks, and potentially machine learning applications in these domains.
1.3. Journal/Conference
The paper does not explicitly state the specific journal or conference where it was published, but it is a research paper associated with the IMDEA Networks Institute. Given the context of the content (mobile networks, AI, XAI), it would likely be published in a top-tier networking conference (e.g., IEEE INFOCOM, ACM MobiCom, IEEE SECON) or a relevant journal.
1.4. Publication Year
The publication year is not explicitly stated in the provided text. However, internal references (e.g., [1], [17]) indicate citations up to 2023, suggesting the paper was likely published in late 2023 or 2024.
1.5. Abstract
Next-generation mobile networks will heavily rely on artificial intelligence (AI) for forecasting critical parameters like traffic load and bandwidth to optimize resource management. Long-Short Term Memory (LSTM) models have proven effective for time series forecasting in this domain but suffer from a lack of interpretability, hindering their deployment in production environments. Existing Explainable Artificial Intelligence (XAI) techniques, primarily designed for computer vision and natural language processing, fail to provide useful insights for time series data because they are "blind" to temporal characteristics and semantic richness.
This paper introduces AICHRONOLENS, a novel tool designed to overcome these limitations by linking traditional XAI explanations with the temporal properties of input time series. It achieves this by using an imaging technique called Gramian Angular Field (GAF) to represent time series, which allows AICHRONOLENS to identify and correlate patterns between XAI relevance scores and the input's temporal features. This enables a deeper understanding of model behavior, including the root causes of prediction errors. Extensive evaluations using real-world mobile traffic traces demonstrate AICHRONOLENS's ability to uncover hidden model behaviors and improve model performance by up to 32% through informed refinement.
1.6. Original Source Link
/files/papers/690c5fc10de225812bf932a5/paper.pdf This appears to be a direct PDF link, likely hosted on a repository or institutional server, indicating it's an officially accessible version of the paper.
2. Executive Summary
2.1. Background & Motivation
The core problem the paper aims to solve is the lack of interpretability and explainability of Long-Short Term Memory (LSTM) models when applied to time series forecasting in mobile networks. Next-generation mobile networks (5G and beyond) increasingly depend on accurate traffic pattern forecasts (e.g., traffic load, bandwidth, channel spectrum utilization) for efficient resource management, network deployment, routing, mobility management, and energy efficiency. LSTMs have shown great success in these forecasting tasks.
However, the inherent complexity and "black-box" nature of LSTMs make them difficult for human operators (like network managers) to understand and trust. This opacity hinders their deployment in real-world production networks. Existing Explainable Artificial Intelligence (XAI) techniques, while effective in domains like computer vision and natural language processing, are inadequate for time series data. They are often "blind" to the crucial temporal characteristics of the input and struggle with the less "semantically rich" nature of raw time series data compared to images or text. This leads to ambiguous explanations that don't reveal the true underlying reasons for model predictions or errors, particularly in a temporal context.
The paper's entry point or innovative idea is to address this fundamental flaw by linking legacy XAI explanations with the temporal properties of the input time series. This aims to provide deeper insights into LSTM model behavior, diagnose errors, and ultimately build trust for their adoption in mobile networks.
2.2. Main Contributions / Findings
The paper outlines three key contributions (C) and three key findings (F):
- C1. Design of
AICHRONOLENS: The paper designsAICHRONOLENS, a new tool that overcomes the shortcomings of prominentXAItools for time series forecasting. It achieves this by harnessing the linear relationship betweenXAIrelevance scores and the temporal characteristics of input sequences, specifically using an imaging technique (Gramian Angular Field (GAF)) to enrich the input's expressiveness. - C2. Extensive Evaluation and Detailed Explanations: The authors perform a thorough evaluation of
AICHRONOLENSusing real-world datasets and variousLSTMmodels. This evaluation demonstrates thatAICHRONOLENSprovides highly detailed explanations regarding model behavior, which are valuable for verifying model robustness and for ongoing monitoring. - C3. Reproducibility and Artifact Release: The study contributes to the research community by releasing the trained
LSTMmodels and theAICHRONOLENScode, encouraging further research and reproducibility. - F1. Pinpointing Hyperparameter Differences:
AICHRONOLENS, unlike legacyXAItools, can identify differences inLSTMmodel behavior stemming from hyperparameter settings (e.g., learning rates). For instance, higher learning rates correlate with stronger, potentially non-linear, relationships between relevance scores and time series inputs, while lower learning rates show weaker or non-linear correlations. - F2. Relating Correlation Coefficients to Model Errors: The correlation coefficients generated by
AICHRONOLENSpossess geometrical properties that can be directly linked to model errors. The tool reveals the root causes of these errors, differentiating between poor model design and data that is inherently difficult to predict. - F3. Refining Training and Improving Performance:
AICHRONOLENScan be effectively used to guide the refinement of model training. By identifying specific weaknesses, it enables targeted data augmentation and minor hyperparameter adjustments, leading to significant improvements in model performance (e.g., up to 32% reduction inMAEfor specific error scenarios).
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand this paper, a reader needs to grasp several fundamental concepts:
- Time Series Forecasting: This is the process of predicting future values of a variable based on historical, time-ordered observations. In mobile networks, this involves forecasting metrics like traffic load or user count over time. The paper defines this formally: given a sequence of past values , the goal is to predict the future value , where is a prediction function (e.g., an
LSTMmodel). - Long-Short Term Memory (LSTM): A type of recurrent neural network (RNN) specifically designed to handle sequential data and address the vanishing/exploding gradient problem common in traditional RNNs.
LSTMsuse internal "gates" (input, forget, output gates) to regulate the flow of information through their memory cells, allowing them to learn long-term dependencies in the data. This makes them particularly effective for time series forecasting. - Explainable Artificial Intelligence (XAI): A field of AI that aims to make AI models more transparent and understandable to humans. It focuses on providing insights into how and why an AI model arrives at a particular decision or prediction, rather than just providing the output. This is crucial for building trust, debugging, and ensuring ethical deployment of AI systems.
- Pearson Correlation Coefficient (): A statistical measure that quantifies the linear relationship between two sets of data. It ranges from -1 to +1, where +1 indicates a perfect positive linear correlation, -1 indicates a perfect negative linear correlation, and 0 indicates no linear correlation. The paper uses this to assess the linear relationship between
XAIrelevance scores and theGAFrepresentation of time series.- Conceptual Definition: The Pearson correlation coefficient measures the strength and direction of a linear relationship between two variables. If one variable increases as the other increases, they have a positive correlation. If one variable decreases as the other increases, they have a negative correlation. If there's no consistent pattern, the correlation is near zero.
- Mathematical Formula: The Pearson correlation coefficient between two variables and is defined as:
- Symbol Explanation:
- : The number of data points (samples).
- : The value of the first variable at data point .
- : The value of the second variable at data point .
- : The mean of the first variable.
- : The mean of the second variable.
- : Summation operator.
- Gramian Angular Field (GAF): An imaging technique that transforms a 1D time series into a 2D image. It works by converting time series values into polar coordinates, where the value determines the angle and the time step determines the radius. A Gram matrix is then constructed from the angular sums/differences, encoding temporal correlations and local patterns (maxima/minima) into an image. This transformation allows
MLmodels (especially those designed for images, like Convolutional Neural Networks) to potentially analyze time series data in a new way, and in this paper, provides an enriched representation for correlation.
3.2. Previous Works
The paper contextualizes its contribution by discussing existing XAI techniques and their limitations, particularly when applied to time series data.
-
Legacy XAI Techniques:
- LayeR-wise backPropagation (LRP) [21]: A model-specific
XAItechnique that assigns a relevance score to each input feature, indicating its contribution to the model's output. It works by propagating the prediction relevance backward through the neural network layers, following a conservation principle (total relevance is maintained). When it reaches the input layer, it distributes the total relevance among the input elements. This provides insights into which parts of the input data are most influential for a given prediction. - SHapely Additive exPlanations (SHAP) [22]: A model-agnostic
XAItechnique based on cooperative game theory. It calculatesShapley valuesfor each feature, representing the average marginal contribution of that feature across all possible coalitions (combinations) of features.SHAPcan provide both local (explaining a single prediction) and global (explaining overall model behavior) explanations. - Local Interpretable Model-agnostic Explanations (LIME) [23]: Another model-agnostic
XAItechnique that explains individual predictions of any classifier. It approximates the complex black-box model locally with a simpler, interpretable model (e.g., linear model) around the prediction of interest. It perturbs the input, observes changes in the output, and learns a local linear model to explain feature importance. - DeepLIFT [24]: A model-specific
XAItechnique for deep learning models that attributes contributions to input neurons based on comparing the activation of each neuron to its "reference" activation. It back-propagates activation differences from the output to the input layer. - Eli5 [35]: A Python library that helps debug machine learning classifiers and explain their predictions. It supports various
MLframeworks and providesXAIfunctionalities, often using techniques like permutation importance orLIMEunder the hood.
- LayeR-wise backPropagation (LRP) [21]: A model-specific
-
XAI for Time Series: The paper notes that
XAItechniques were primarily developed for computer vision andNLP. While some adaptations exist for time series, especially for classification [32], [33], they often fall short for forecasting, particularly for univariate time series. The paper highlights a key limitation: even adapted methods likeLRPandSHAPfail to provide useful insights beyond simple input relevance, often giving ambiguous explanations with no clear relation to the temporal input sequence. -
Time Series to Image Transformations:
- Recurrence Plots (RP): A technique that visualizes the recurrence of states in a dynamical system. It plots a 2D matrix where points are marked if the system's state at two different times is "close" in phase space. The paper mentions
RPare limited by variable length and scale issues and cannot effectively represent trends. - Markov Transition Field (MTF): Transforms a time series into an image by encoding the transition probabilities between different quantile bins of the time series. While it preserves temporal dependencies like
GAF,MTFcannot reconstruct the original time series, unlikeGAF.
- Recurrence Plots (RP): A technique that visualizes the recurrence of states in a dynamical system. It plots a 2D matrix where points are marked if the system's state at two different times is "close" in phase space. The paper mentions
3.3. Technological Evolution
The field of AI in mobile networks has rapidly evolved, driven by the increasing complexity and data volumes of 5G and future 6G networks. Initially, simple statistical models were used for traffic prediction. With the rise of deep learning, LSTMs became prominent due to their ability to handle sequential data and capture complex temporal patterns, leading to higher quality predictions [54]. Concurrently, the demand for transparent AI led to the development of XAI. However, XAI techniques largely originated in computer vision (e.g., explaining why a CNN classifies an image as a "cat" by highlighting pixels) and NLP (e.g., explaining why an RNN predicts a word by highlighting previous words). The challenge emerged when applying these XAI methods to time series, where the "semantics" are less intuitive (e.g., what does "this pixel is important" mean in a time series image?), and the temporal dependencies are paramount. This paper's work fits into this timeline by bridging the gap between mature LSTM forecasting, advanced XAI principles, and the unique temporal characteristics of mobile network time series data, specifically addressing the interpretability shortcomings that hinder real-world deployment.
3.4. Differentiation Analysis
Compared to existing XAI methods for time series, AICHRONOLENS offers several core differences and innovations:
- Addressing Ambiguity: The primary differentiation is
AICHRONOLENS's ability to resolve the ambiguity of legacyXAItechniques. As shown in Section II-B,LRPandSHAPcan assign similar relevance scores to vastly different input sequences, making their explanations uninformative.AICHRONOLENSexplicitly tackles this by integrating temporal context. - Linking XAI with Temporal Characteristics: Unlike traditional
XAImethods that only provide "input relevance" scores,AICHRONOLENSactively links these scores with the temporal properties of the input sequence. It usesGAFto transform the 1D time series into a 2D representation, revealing pairwise relationships and spatial distances between local maxima/minima. - Enriched Input Expressiveness: By correlating
XAIrelevance scores with theGAFrepresentation,AICHRONOLENSgains a richer understanding of why certain inputs are relevant, rather than just that they are relevant. This allows it to identify if the model gives higher or lower importance to specific temporal features like local maxima or minima. - Diagnosis of Error Causes:
AICHRONOLENSgoes beyond identifying relevant inputs to diagnose the hidden causes of model errors. It can distinguish errors due to poor model design (e.g., lack of training data for specific trends) from errors due to inherently unpredictable data. This level of diagnostic capability is largely absent in existingXAIfor time series. - Geometry of Explanations: The output of
AICHRONOLENS(correlation matrices) can be geometrically interpreted (e.g., "triangle shapes") to reveal trends in prediction and identify problematic transitions, providing a novel visual and quantitative way to understand model behavior over time. - Offline Model Inspection for Online Monitoring: It provides a tool for offline model inspection to synthesize tailored explanations that can then be used for online monitoring, offering a practical pathway for deployment.
4. Methodology
4.1. Principles
The core idea behind AICHRONOLENS is to enhance the depth and utility of explanations provided by existing XAI tools for time series forecasting. It achieves this by addressing the inherent ambiguity of these tools, which often assign similar relevance scores to diverse input sequences without considering their temporal structure. The theoretical basis is that by explicitly linking the XAI relevance scores with an enriched, temporally-aware representation of the input time series, one can reveal deeper insights into the model's decision-making process, diagnose errors, and ultimately improve model design. This linkage is established using the Pearson correlation coefficient between XAI relevance scores and a Gramian Angular Field (GAF) representation of the input.
4.2. Core Methodology In-depth (Layer by Layer)
AICHRONOLENS is structured into four main modules, as illustrated in Figure 3. Each module performs a specific task to progressively enrich the understanding of the LSTM model's behavior.
该图像是图3,AICHRONOLENS架构的示意图,展示了时间序列输入经过GAF转换、XAI处理及LSTM模型预测的过程,最后由Analyzer分析反馈,整合了时间特性与解释性方法。
Fig. 3. AICHRONOLENS architecture
The process begins with the Time Series Input () which is fed into the LSTM Model for forecasting (). Simultaneously, this Time Series Input and the LSTM Model's predictions are processed by AICHRONOLENS modules.
The design of AICHRONOLENS adheres to two principles:
-
: XAI Generality.
AICHRONOLENSis designed to be pluggable with any existingXAItool, making it versatile and allowing for comparative analysis of differentXAIexplanations for the sameLSTMmodel. -
: LSTM Specificity. While general in its
XAIpluggability,AICHRONOLENSinitially focuses onLSTMmodels for univariate time series. Adaptation for spatio-temporal inputs is left for future work.Let's detail each module:
4.2.1. Relevance Scores from XAI ()
This module computes relevance scores () for each element of the input sequence , indicating its contribution to the forecast . The paper describes how this is done using two prominent XAI techniques: LRP and SHAP.
-
LRP (LayeR-wise backPropagation):
LRPcalculates relevance scores by tracking back the individual activation of each neuron and its contribution (weighted by connection strength) from the output layer to the input layer. This process ensures a conservation principle, meaning the total relevance in one layer is preserved in the next as it's propagated backward. The relevance of a neuron in layer to a neuron in layer (where is a subsequent layer) is formally defined as:- Symbol Explanation:
- : Relevance score of neuron in layer with respect to neuron in layer .
- : Total relevance of neuron in layer .
- : Activation of neuron .
- : Weight of the connection between neuron and neuron .
- : Sum of activations weighted by connections for all neurons in layer connected to neuron in layer . This calculation is performed iteratively layer by layer until relevance is distributed across the input layer elements, yielding .
- Symbol Explanation:
-
SHAP (SHapley Additive exPlanations):
SHAPcomputes relevance scores (Shapley values) by determining the average marginal contribution of each input sequence element () across all possible permutations of feature presence or absence. The relevance score for an input feature is calculated as:- Symbol Explanation:
- : Shapley value (relevance score) for feature using prediction function .
- : Total number of features in the input sequence .
- : Number of features present in a subset .
- : A subset of the input features .
- : Cardinality (number of features) in the subset .
- : Inverse of the binomial coefficient, representing the number of ways to choose features from
n-1features. This accounts for the number of permutations. - : Model prediction with the subset of features .
- : Represents the subset where feature is excluded. The full term within the square brackets is typically , representing the marginal contribution of to the prediction when is already present. Note: The provided formula in the paper seems incomplete here, as a typical Shapley value calculation involves the difference . Assuming the standard definition, the summation should be over these marginal contributions. The paper implies that is the model prediction with , and is the prediction with all features .
- Symbol Explanation:
4.2.2. Imaging via GAF ()
This module transforms the 1D input time series into a 2D image representation using the Gramian Angular Field (GAF) technique. GAF is chosen because it preserves temporal dependencies, allows reconstruction of the original time series, and captures complex temporal correlations, unlike Recurrence Plots (RP) or Markov Transition Field (MTF).
The transformation process involves three steps:
-
Rescaling: The original elements (for ) are first rescaled to the range :
- Symbol Explanation:
- : The rescaled value of .
- : The original value of the -th element in the input sequence .
- : The maximum value in the input sequence .
- : The minimum value in the input sequence .
- Symbol Explanation:
-
Polar Coordinates Transformation: The rescaled values are then converted into polar coordinates, where the value itself dictates the angle and the time step dictates the radius:
- Symbol Explanation:
- : The angular coordinate, derived from the arccosine of the rescaled value.
- : The arccosine function applied to the rescaled value , which maps values in to angles in .
- : The radial coordinate, determined by the time step .
- : The index of the time step (from
1to ). - : A regularization factor for the span of the polar coordinate system. This transformation is bijective (original time series can be recovered) and preserves absolute temporal relations.
- Symbol Explanation:
-
Gramian Angular Field (GAF) Matrix Construction: A Gram matrix is constructed, where each element is the cosine of the sum of two angles from the polar coordinate representation. This matrix encodes pairwise relationships between points in the time series.
- Symbol Explanation:
- : The Gramian Angular Field matrix of dimensions , where is the length of the input time series.
- : The angular coordinate corresponding to the -th time step.
- : The cosine function. Alternatively, using an inner product definition:
- Symbol Explanation:
- : The defined inner product between two values and .
v, z: Two values (e.g., ) from the rescaled time series.- : Square root function. The matrix can be rewritten using this inner product:
- Symbol Explanation:
- : The Gramian Angular Field matrix.
- : The inner product (as defined above) between the rescaled values at time steps and .
The
GAFmatrix has properties: it preserves temporal dependency (time increases from top-left to bottom-right), encodes temporal correlations, and its main diagonal contains the original time series values. High values inGAF(near 1) indicate correlations between local maxima or minima; values near 0 indicate correlations between extrema and intermediate points; negative values (near -1) indicate correlations between a local maximum and a local minimum.
- Symbol Explanation:
4.2.3. Defining Correlations ()
This module seeks to establish a linear relationship between the XAI relevance scores () and the GAF representation () of the input time series. Each row of the matrix is a vector that characterizes inner relationships between samples of the input time series. Since is also a vector, the Pearson correlation coefficient can be computed between and each row .
The correlation vector is defined as:
- Symbol Explanation:
- : The correlation vector, containing Pearson correlation coefficients for each row of
GAFagainstLRP/SHAPscores. - : The covariance between a row of the
GAFmatrix and theXAIrelevance scores . - : The standard deviation of the row of the
GAFmatrix. - : The standard deviation of the
XAIrelevance scores . - : The Pearson correlation coefficient between the -th row of the
GAFmatrix () and theXAIrelevance scores (). This process is repeated for each timestep , resulting in a correlation matrix with dimensions :
- : The correlation vector, containing Pearson correlation coefficients for each row of
- Symbol Explanation:
- : The correlation matrix, storing all correlation vectors over time.
- : Length of the input history (number of samples in ).
- : Total number of timesteps for which predictions are made.
- : The Pearson correlation coefficient for the -th row of
GAFat timestep .
4.2.4. Analyzing Correlations ()
The "Analyzer" module is the heart of AICHRONOLENS, synthesizing explanations from the correlation matrix . To observe the evolution of Pearson's coefficients over time for each sample in the input history, a new matrix is created by storing the secondary diagonals of in rows.
For example, given a correlation matrix where and :
The matrix would be formed by secondary diagonals, tracking the influence of specific input elements as they "age" within the history window. For practical use, (transpose of ) is often more convenient. The example given in the paper for seems to imply a different construction or a specific portion of is used to highlight the secondary diagonals:
- Symbol Explanation:
- : A matrix constructed from the secondary diagonals of , where each row tracks the evolution of a specific input's correlation over time. For example, the first row tracks elements that appear in position 3, then 2, then 1 of the input window.
- : Correlation coefficient for the -th row of
GAFat timestep . This implies that elements "age out" of the input window, and their correlation coefficient eventually vanishes.
The analysis of or over specific time windows () is central to synthesizing explanations. The correlation values (positive or negative) generate "triangle shapes" within these matrices. These triangles represent the prediction trend given the time series input. AICHRONOLENS focuses on transitions between these triangles:
-
Smooth transitions: Indicate that the model effectively captures changes in the data trend.
-
Non-smooth transitions: Often associated with model errors, indicating the model is failing to adapt to changes in the trend.
The geometric interpretation of these correlation patterns allows
AICHRONOLENSto identify different causes of errors, which are further explored in the experimental section.
5. Experimental Setup
5.1. Datasets
The study uses two distinct real-world mobile network datasets:
-
(Traffic Load):
- Source: Measurements of traffic volumes collected from a live
4Gnetwork. - Scale & Characteristics: Covers a large metropolitan region in Europe over a period of 3 months. It provides fine-grained information on traffic volumes at each Base Station (BS) with a
3-minute granularity. - Domain: Mobile network traffic forecasting.
- Data Sample: An example data sample would be a sequence of numerical values representing traffic load (e.g., in
GB/min) for a specific BS over 3-minute intervals: . - Rationale: This dataset is suitable for analyzing the model's ability to forecast continuous traffic volumes and how
AICHRONOLENScan capture variations due to hyperparameters in this common mobile network task.
- Source: Measurements of traffic volumes collected from a live
-
(Number of Connected Users):
- Source: Estimated number of active users connected to production
BSs. Data was collected using anLTEpassive monitoring tool that decodes unencrypted information exchanged betweenBSsand users. - Scale & Characteristics: Contains
millisecond-levelinformation about temporary user IDs (RNTI) and scheduling. This raw data is then processed using the methodology from [45] to estimate the number of active users every6 minutes. - Domain: Mobile network user count forecasting.
- Data Sample: A sample would be a sequence of integers representing the number of active users at a
BSat 6-minute intervals:[150, 155, 148, 162, ..., 130]active users. - Rationale: This dataset provides a different type of mobile network metric (discrete user count vs. continuous traffic load) and allows
AICHRONOLENSto be validated on a distinct forecasting problem, particularly for identifying errors specific to data characteristics.
- Source: Estimated number of active users connected to production
5.2. Evaluation Metrics
The paper uses several evaluation metrics, primarily focusing on Mean Absolute Error (MAE) for model training and assessment, and Silhouette Score for clustering analysis.
-
Mean Absolute Error (MAE):
- Conceptual Definition:
MAEmeasures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between the predicted values and the actual values.MAEis robust to outliers compared toMSEbecause it does not square the errors. - Mathematical Formula:
- Symbol Explanation:
- : The total number of observations or data points.
- : The actual (true) value for the -th observation.
- : The predicted value for the -th observation.
- : Absolute value.
- : Summation operator.
- Conceptual Definition:
-
Mean Squared Error (MSE):
- Conceptual Definition:
MSEmeasures the average of the squares of the errors. It calculates the average squared difference between the predicted values and the actual values.MSEpenalizes larger errors more heavily thanMAEdue to the squaring operation, making it sensitive to outliers. - Mathematical Formula:
- Symbol Explanation:
- : The total number of observations or data points.
- : The actual (true) value for the -th observation.
- : The predicted value for the -th observation.
- : Squaring operation.
- : Summation operator.
- Conceptual Definition:
-
Silhouette Score:
- Conceptual Definition: The
silhouette scoreis used to evaluate the quality of clusters created by a clustering algorithm. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). A high silhouette score indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. Scores range from -1 (poor clustering) to +1 (dense, well-separated clustering), with 0 indicating overlapping clusters. - Mathematical Formula: For a single data point :
The overall
silhouette scorefor a clustering is the averages(i)over all data points. - Symbol Explanation:
s(i): Thesilhouette scorefor data point .a(i): The average distance from data point to all other points in the same cluster. This measures cohesion.b(i): The minimum average distance from data point to all points in a different cluster (i.e., the nearest cluster that is not a part of). This measures separation.- : The maximum function.
- Conceptual Definition: The
5.3. Baselines
The paper's primary focus is on AICHRONOLENS as a tool for interpreting and improving existing LSTM models, rather than proposing a new forecasting model to compete with other forecasting techniques. Therefore, its "baselines" are implicitly:
-
Legacy XAI Tools:
AICHRONOLENSis compared against the limitations ofLRPandSHAPdirectly, demonstrating how it provides deeper, less ambiguous insights than these tools alone. The clustering analysis in Section II-B (Figure 2) explicitly shows the inadequacy ofLRPexplanations withoutAICHRONOLENS's enhancement. -
Variations of LSTM Models: The paper evaluates
AICHRONOLENSon differentLSTMmodels with varying hyperparameters (number of neurons, learning rate, dropout) to show its generality and ability to identify performance differences. For instance, the "baseline model A_A" is compared against an "optimized model" (derived throughAICHRONOLENSdiagnosis) for performance improvement.The overall
LSTMmodel architecture used for both datasets is: -
Layers: One unidirectional
LSTMlayer, followed by an output layer. -
Output Layer: Single neuron with a linear activation function for one-step prediction.
-
Input Sequence Length: Predicts the next value based on a history of
20 past samples. -
Optimizer:
Adamoptimizer. -
Loss Function:
Mean Absolute Error (MAE). -
Data Split: Standard
80:20train-test split.Specifically for , six different
LSTMmodels were trained with intentional variations in configuration: The following are the results from Table I of the original paper:
| MODEL ID | Neurons | LEARNING RATE | MAE |
|---|---|---|---|
| A | 200 | 0.0001 | 0.96 |
| B | 100 | 0.0001 | 0.99 |
| C | 50 | 0.0001 | 1.09 |
| A_A | 200 | 0.001 | 0.67 |
| B_B | 100 | 0.001 | 0.68 |
| c_c | 50 | 0.001 | 0.95 |
For , a single optimized model was used:
- LSTM Layer: 25 neurons with a
tanhactivation function.
6. Results & Analysis
6.1. Core Results Analysis
The extensive evaluations with real-world mobile traffic data (datasets and ) demonstrate AICHRONOLENS's ability to provide highly detailed explanations that surpass the capabilities of legacy XAI tools. The key findings reveal how AICHRONOLENS pinpoints model behaviors, diagnoses error causes, and guides model optimization.
6.1.1. Finding : Pinpointing Temporal Characteristics
AICHRONOLENS effectively identifies specific temporal characteristics in the input sequence that stimulate the model, an insight largely missed by LRP and SHAP alone. While traditional XAI methods might assign similar relevance scores over time, the correlation vectors produced by AICHRONOLENS (linking XAI scores with GAF representation) clearly show temporal variations.
As qualitatively demonstrated in Figure 4, SHAP scores might consistently highlight recent samples as relevant. However, AICHRONOLENS's correlation vectors reveal deeper insights:
-
When samples entering or leaving the input sequence are not particularly relevant (e.g., intermediate values), the correlation between
SHAPandGAFmight be weak or almost non-existent (e.g., window 20). -
When a significant temporal feature, such as a new
local minimum, enters the input sequence,AICHRONOLENScaptures a substantial modification in the correlation vector. This indicates an alignment between theXAI's perceived relevance and the actual temporal saliency highlighted byGAF.This ability to detect such changes is crucial because being "blind" to these temporal shifts can be detrimental to model performance, as the paper argues.
该图像是论文中图4的多子图复合图,展示了窗口19到22的负载时间序列、对应的SHAP得分、GASF矩阵和相关向量。负载曲线的局部最大值和最小值用红色方块和蓝色圆点标注,揭示了时间序列的关键特征及其对模型预测的影响。
Fig. 4. A detailed look at AICHRoNoLENs. Red squares and blue dots represent the local maxima and local minima respectively.
6.1.2. Finding : Spotting Categories of Errors
AICHRONOLENS quantifies and differentiates between two categories of errors: (poor model design) and (data inherently hard to predict).
Analysis of (Poor Model Design):
AICHRONOLENS enables tracing the root cause of errors, revealing weaknesses in model design not evident from coarse metrics like MAE. By analyzing the correlation matrix , the presence of trend changes in the time series is characterized by triangles of negative correlation followed by positive correlation.
-
Sharp Triangles:
Well-formed, sharply outlined triangles(e.g., Figure 5a, top) indicate that the model performs well and makes few mistakes in that part of the time series (Figure 5a, bottom). -
Non-Sharp Triangles:
Noisy, non-sharp triangles(e.g., Figure 5b, top) are associated with high errors, particularly duringabrupt fallsin traffic where the model struggles to predict the decrease accurately (Figure 5b, bottom). This behavior is consistently observed across decreasing slopes in the test set.
该图像是图表,展示了图6中模型误差根因分析的四个子图。包括误差与锐度评分、负载、负载变化差异的关系,以及训练集流量值分布,帮助理解误差产生的因素。
To quantitatively identify these triangles, a pattern recognition technique is introduced:
-
Transition Detection: Identify transitions between triangles by computing the difference of median correlation scores between consecutive correlation vectors and .
-
Windowed Observation: Once a column interrupting a triangle is found, a window (e.g., ) centered on this column is used to observe 3 preceding and 3 succeeding columns, forming a sub-matrix .
-
Binarization: Each element in is binarized into :
- Symbol Explanation:
- : The binarized correlation value.
- : The original correlation value from matrix .
-1: Assigned if the correlation is negative (between -0.9 and 0).1: Assigned if the correlation is positive (between 0 and 0.9). This binarization helps in identifying strong positive or negative correlations within a specific range.
- Symbol Explanation:
-
Sharpness Score Calculation: For the resulting binarized matrix , the number of positive and negative values () for each secondary diagonal of length is computed and stored in an array. A sharpness score is then computed:
- Symbol Explanation:
- : The sharpness score.
- : Length of the input history.
- : Window size for observation.
- : The number of positive or negative values on the -th secondary diagonal.
- : Absolute value of . The sharpness score ranges such that indicates higher non-sharpness (higher values mean more non-sharpness), and indicates higher sharpness (lower values mean more sharpness).
- Symbol Explanation:
The analysis reveals a direct relationship: as the sharpness score increases (indicating more non-sharpness), the error also increases (Figure 6a). Examination of absolute errors across the test set (Figure 6b) shows that the highest errors (5-8 GB/min) occur during moderate to low loads associated with abrupt falls. Figure 6c further illustrates that the model significantly underestimates the ground truth during severe load decreases. A critical insight comes from Figure 6d, which shows a lack of training samples in the training set for precisely these traffic volumes experiencing abrupt falls. This indicates that the model's poor generalization during these events is due to insufficient training data for such specific trends.
该图像是图表,展示了图6中模型误差根因分析的四个子图。包括误差与锐度评分、负载、负载变化差异的关系,以及训练集流量值分布,帮助理解误差产生的因素。
Fig. 6. Root cause analysis of model errors
Model Re-design and Performance Improvement:
AICHRONOLENS thus points to data augmentation as a solution. By augmenting the training dataset with more samples featuring abrupt load decreases, and introducing a sigmoid activation function before the output layer, a new optimized model is trained (starting from model A_A settings). This optimized model significantly outperforms the baseline (model A_A), especially in these challenging scenarios.
该图像是图表,展示了AICHRoNoLENS诊断后基线模型和优化模型的误差分布对比,横轴为误差(GB/min),纵轴为出现频率,优化模型误差集中度更高,表现更优。
Fig. 7. Error of baseline and optimized models after AICHRoNoLENS diagnosis
Figure 7 shows that the optimized model not only reduces errors of high magnitude (tails of the error distribution) but also centers the error distribution around zero and reduces the frequency of small-magnitude errors. For windows around abrupt load decreases, the MAE of model A_A was 0.921, while the optimized model achieved an MAE of 0.619, representing a 32% improvement. Over the entire test set, the optimized model achieved an MAE of 0.69, a 2% improvement over A_A (0.67). This highlights AICHRONOLENS's power in diagnosing specific weaknesses and guiding targeted improvements.
Analysis of (Data-Specific Errors):
Even after addressing model design issues (), AICHRONOLENS can identify errors stemming from the inherent characteristics of the data itself. For dataset , AICHRONOLENS reveals sequences of high-magnitude errors that change sign (e.g., positive then negative). This behavior is characterized by correlation triangles that are interrupted by a full column of weak correlation.
该图像是包含三部分的图表,展示了时间步长区间内移动网络用户活跃数变化、历史数据的热力图及预测误差。红框突出显示了误差异常区间,体现了模型在该处表现波动,方便进一步深入分析时间序列模型行为。
Fig. 8. Analysis consecutive errors with high magnitude that change of sign error sign
To quantify this, the Euclidean distance between subsequent correlation vectors is computed and normalized to [0, 1]. When , in 65% of cases, a change in error sign is observed, with a corresponding MAE of 0.46. This is significantly higher than the overall MAE of 0.13 for , indicating that AICHRONOLENS effectively pinpoints regions where the data itself is challenging for the model to predict consistently.
6.1.3. Finding : Impact of Learning Rates on Correlations
AICHRONOLENS provides insights into how different hyperparameter settings, specifically learning rates, influence the correlation patterns. Figure 9 qualitatively illustrates this:
-
Models with lower learning rates (e.g., 0.0001, models A, B, C) tend to exhibit
strong positive or negative correlations, with values approaching 1 or -1. This suggests that the model's learned relationships are more pronounced and stable. -
Models with higher learning rates (e.g., 0.001, models A_A, B_B, C_C) show
correlation scores clustered around zero, indicating weaker or negligible correlations. This implies that while higher learning rates might enable faster adaptation to new conditions, they can lead to less stable or less discernible linear relationships betweenXAIscores andGAFrepresentations.
该图像是两组热力图组成的图表,展示了不同时间步(Timestep)与历史长度(History)之间的相关性对比,分别对应模型A、B、C的结果,体现了模型随时间推移的关联变化。
Fig. 9. Correlation vector for models with different learning rates: top 0.0001 and bottom 0.001
This finding suggests that AICHRONOLENS can offer precise insights into the heterogeneous accuracy and learning behavior of models trained with different learning rates. Interestingly, the study also notes that for this specific dataset, the number of neurons (depth of LSTM architecture) played a marginal role in these observed correlation patterns.
6.2. Data Presentation (Tables)
The following are the results from Table I of the original paper:
| MODEL ID | Neurons | LEARNING RATE | MAE |
|---|---|---|---|
| A | 200 | 0.0001 | 0.96 |
| B | 100 | 0.0001 | 0.99 |
| C | 50 | 0.0001 | 1.09 |
| A_A | 200 | 0.001 | 0.67 |
| B_B | 100 | 0.001 | 0.68 |
| c_c | 50 | 0.001 | 0.95 |
This table summarizes the configurations and performance (MAE) of the six LSTM models trained for dataset . It shows that models with a higher learning rate (0.001) generally achieve lower MAE values, indicating better overall performance, which aligns with the observation that higher learning rates result in a model that adapts more rapidly.
6.3. Ablation Studies / Parameter Analysis
While the paper doesn't present traditional ablation studies (e.g., removing GAF or Pearson correlation), it effectively performs a parameter analysis by training multiple LSTM models with varying hyperparameters:
-
Number of Neurons: Models A, B, C (and A_A, B_B, C_C) explore 200, 100, and 50 neurons. The analysis in Finding suggests that, for the specific dataset, the number of neurons played a marginal role compared to the learning rate in affecting correlation patterns.
-
Learning Rate: Two distinct learning rates (0.0001 and 0.001) are tested. Finding explicitly analyzes the impact of learning rate on the correlation vectors, demonstrating that higher learning rates lead to weaker, more clustered-around-zero correlations, while lower learning rates result in stronger, more distinct positive/negative correlations. This indicates that
AICHRONOLENScan differentiate howLSTMswith different learning strategies form internal representations and explanations. -
Dropout: The models for explicitly include a dropout layer for regularization, showing the practical setup.
The "root cause analysis of model errors" (Finding ) also acts as a form of
post-hoc parameter analysisby showing howAICHRONOLENSled to an informed model re-design. This involved: -
Data Augmentation: Augmenting the training set to include more
abrupt load decreasescenarios, addressing a data scarcity issue identified byAICHRONOLENS. -
Activation Function Change: Introducing a
sigmoidactivation function before the output layer for the optimized model. This targeted intervention, guided byAICHRONOLENS's diagnostics, significantly improved performance (32% reduction inMAEfor critical scenarios), demonstrating its utility in model development and hyperparameter tuning beyond simple trial-and-error.
7. Conclusion & Reflections
7.1. Conclusion Summary
This paper effectively addresses the critical challenge of making AI models, particularly LSTMs, interpretable for time series forecasting in mobile networks. It introduces AICHRONOLENS, a novel tool that significantly advances XAI for this domain. By uniquely linking legacy XAI relevance scores (from tools like LRP and SHAP) with the temporal characteristics of input sequences, specifically through the use of Gramian Angular Fields (GAF) and Pearson correlation, AICHRONOLENS provides a much deeper understanding of model behavior than previously possible. The extensive evaluations using real-world mobile traffic traces (D1 and D2) demonstrate its ability to pinpoint diverse categories of model errors, accurately trace their root causes (whether due to poor model design or inherently complex data), and subsequently guide model refinement. This diagnostic capability led to a notable performance improvement of up to 32% in specific challenging scenarios through targeted data augmentation and minor hyperparameter adjustments.
7.2. Limitations & Future Work
The authors explicitly state some limitations and suggest future work:
- LSTM Specificity: Currently,
AICHRONOLENS's scope is restrained toLSTMmodels and univariate time series. - Future Work - Spatio-temporal Inputs: The adaptation of
AICHRONOLENSfor models dealing withspatio-temporal inputs(e.g., forecasting traffic across multiple interconnectedBSsover time) is left for future research. This would be a significant extension, as such data introduces additional complexity in representing and correlating features.
7.3. Personal Insights & Critique
The AICHRONOLENS framework presents a highly innovative and practical approach to XAI for time series, a domain often underserved by existing interpretability methods. The core insight—that raw XAI scores are insufficient without temporal context—is well-argued and empirically demonstrated.
Inspirations and Applications:
- Practical Debugging: This tool is immensely valuable for
MLengineers and network operators. Instead of blindly trustingLSTMpredictions or struggling with vagueXAIoutputs,AICHRONOLENSprovides actionable insights for debugging model failures. Identifying specific data patterns (e.g.,abrupt falls) that cause errors and linking them to a lack of training data is a direct pathway to model improvement (e.g., through targeted data augmentation). - Model Robustness and Trust: By offering a deeper understanding of why a model predicts incorrectly,
AICHRONOLENScan significantly contribute to building trust inAIsystems deployed in critical infrastructure like mobile networks. This increased transparency is crucial for regulatory compliance and operational acceptance. - Beyond Mobile Networks: The methodology of linking
XAIoutputs withGAF(or other time series imaging techniques) and correlation analysis could be widely transferable to other time series forecasting domains, such as financial market prediction, energy consumption forecasting, or industrial anomaly detection. Any domain whereLSTMsare used for time series and interpretability is a concern could potentially benefit. - Hyperparameter Tuning Insight: The finding that learning rates affect correlation patterns provides a novel perspective on hyperparameter tuning. It suggests that
XAImetrics fromAICHRONOLENScould be used not just for debugging, but also as a guide during model development to understand how different hyperparameter choices influence the model's internal reasoning and sensitivity to temporal features.
Potential Issues and Areas for Improvement:
-
Computational Cost: The process involves several steps:
XAIcalculation,GAFtransformation, and Pearson correlation for each timestep. The paper mentions a 16-hour execution time for clustering (whichAICHRONOLENSsimplifies but doesn't eliminate the underlying computations), suggesting that for very long time series or real-time online diagnostics, the computational overhead might be considerable. While designed for offline inspection, optimizing this for faster feedback loops could be beneficial. -
Generalizability of
GAF: WhileGAFis a powerful tool, the effectiveness oftime series to image transformationscan sometimes depend on the specific characteristics of the time series. Further studies could explore if other imaging techniques (e.g.,MTFin combination withGAF, or differentGAFvariants) or even direct statistical feature extraction methods could complement or replaceGAFfor certain data types. -
Interpretability of
GAFitself: WhileGAFprovides an "enriched expressiveness," a novice might still find interpreting aGAFmatrix challenging. The paper does a good job explaining its properties, but connecting the visual patterns inGAFdirectly to human-understandable temporal events still requires some expertise. The correlation vectors simplify this, but the underlying interpretation of the GAF itself is a prerequisite. -
Complexity of Correlation Patterns: The "triangle shapes" and their transitions, while insightful, still require a human expert to analyze and interpret. While the paper provides a sharpness score, further automation in identifying and categorizing these patterns (e.g., using unsupervised learning on the or matrices) could make
AICHRONOLENSeven more user-friendly for non-experts. -
Quantifying the "32% improvement": While a 32% improvement in
MAEfor specific error scenarios is significant, it's important to note that the overallMAEimproved by a more modest 2%. This highlights thatAICHRONOLENSexcels at tackling specific types of errors, which is extremely valuable, but a nuanced understanding of its impact on overall model performance is necessary. This isn't a criticism but a clarification for interpreting the results.Overall,
AICHRONOLENSis a commendable step forward in makingLSTMsfor time series forecasting more transparent and controllable, particularly within the demanding context of mobile network operations. Its focus on linking explainability with temporal context is a crucial innovation.
Similar papers
Recommended via semantic vector search.