Paper status: completed

AICHRONO L ENS: Advancing Explainability for Time Series AI Forecasting in Mobile Networks

Explainability for Time Series AI Forecasting (1)Mobile Network Traffic Forecasting (1)LSTM Applications (1)Temporal Feature-Linked Explainability (1)Mobile Communications Resource Management (1)

Original Link

Price: 0.100000

5 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

AICHRONO L ENS integrates traditional XAI with temporal input features, enhancing interpretability of LSTM-based time series forecasting in mobile networks and uncovering error causes, improving prediction accuracy by 32%.

Abstract

AIC HRONO L ENS : Advancing Explainability for Time Series AI Forecasting in Mobile Networks Claudio Fiandrino, Eloy Pérez Gómez, Pablo Fernández Pérez, Hossein Mohammadalizadeh, Marco Fiore and Joerg Widmer IMDEA Networks Institute, Madrid, Spain Email: {name.surname}@imdea.org Abstract —Next-generation mobile networks will increasingly rely on the ability to forecast traffic patterns for resource management. Usually, this translates into forecasting diverse objectives like traffic load, bandwidth, or channel spectrum utilization, measured over time. Among the other techniques, Long-Short Term Memory ( LSTM ) proved very successful for this task. Unfortunately, the inherent complexity of these models makes them hard to interpret and, thus, hampers their deployment in production networks. To make the problem worsen, EXplainable Artificial Intelligence ( XAI ) techniques, which are primarily conceived for computer vision and natural language processing, fail to provide useful insights: they are blind to the temporal characteristics of the input and only work well with highly rich semantic data like images or text. In this paper, we take the research on XAI fo

Mind Map

In-depth Reading

English Analysis~32 min read · 42,326 chars

1. Bibliographic Information

1.1. Title

AICHRONOLENS: Advancing Explainability for Time Series AI Forecasting in Mobile Networks

1.2. Authors

Claudio Fiandrino
Eloy Pérez Gómez
Pablo Fernández Pérez
Hossein Mohammadalizadeh
Marco Fiore
Joerg Widmer

All authors are affiliated with the IMDEA Networks Institute, Madrid, Spain. Their research backgrounds appear to be in telecommunications, mobile networks, and potentially machine learning applications in these domains.

1.3. Journal/Conference

The paper does not explicitly state the specific journal or conference where it was published, but it is a research paper associated with the IMDEA Networks Institute. Given the context of the content (mobile networks, AI, XAI), it would likely be published in a top-tier networking conference (e.g., IEEE INFOCOM, ACM MobiCom, IEEE SECON) or a relevant journal.

1.4. Publication Year

The publication year is not explicitly stated in the provided text. However, internal references (e.g., [1], [17]) indicate citations up to 2023, suggesting the paper was likely published in late 2023 or 2024.

1.5. Abstract

Next-generation mobile networks will heavily rely on artificial intelligence (AI) for forecasting critical parameters like traffic load and bandwidth to optimize resource management. Long-Short Term Memory (LSTM) models have proven effective for time series forecasting in this domain but suffer from a lack of interpretability, hindering their deployment in production environments. Existing Explainable Artificial Intelligence (XAI) techniques, primarily designed for computer vision and natural language processing, fail to provide useful insights for time series data because they are "blind" to temporal characteristics and semantic richness.

This paper introduces AICHRONOLENS, a novel tool designed to overcome these limitations by linking traditional XAI explanations with the temporal properties of input time series. It achieves this by using an imaging technique called Gramian Angular Field (GAF) to represent time series, which allows AICHRONOLENS to identify and correlate patterns between XAI relevance scores and the input's temporal features. This enables a deeper understanding of model behavior, including the root causes of prediction errors. Extensive evaluations using real-world mobile traffic traces demonstrate AICHRONOLENS's ability to uncover hidden model behaviors and improve model performance by up to 32% through informed refinement.

1.6. Original Source Link

/files/papers/690c5fc10de225812bf932a5/paper.pdf This appears to be a direct PDF link, likely hosted on a repository or institutional server, indicating it's an officially accessible version of the paper.

2. Executive Summary

2.1. Background & Motivation

The core problem the paper aims to solve is the lack of interpretability and explainability of Long-Short Term Memory (LSTM) models when applied to time series forecasting in mobile networks. Next-generation mobile networks (5G and beyond) increasingly depend on accurate traffic pattern forecasts (e.g., traffic load, bandwidth, channel spectrum utilization) for efficient resource management, network deployment, routing, mobility management, and energy efficiency. LSTMs have shown great success in these forecasting tasks.

However, the inherent complexity and "black-box" nature of LSTMs make them difficult for human operators (like network managers) to understand and trust. This opacity hinders their deployment in real-world production networks. Existing Explainable Artificial Intelligence (XAI) techniques, while effective in domains like computer vision and natural language processing, are inadequate for time series data. They are often "blind" to the crucial temporal characteristics of the input and struggle with the less "semantically rich" nature of raw time series data compared to images or text. This leads to ambiguous explanations that don't reveal the true underlying reasons for model predictions or errors, particularly in a temporal context.

The paper's entry point or innovative idea is to address this fundamental flaw by linking legacy XAI explanations with the temporal properties of the input time series. This aims to provide deeper insights into LSTM model behavior, diagnose errors, and ultimately build trust for their adoption in mobile networks.

2.2. Main Contributions / Findings

The paper outlines three key contributions (C) and three key findings (F):

C1. Design of AICHRONOLENS: The paper designs AICHRONOLENS, a new tool that overcomes the shortcomings of prominent XAI tools for time series forecasting. It achieves this by harnessing the linear relationship between XAI relevance scores and the temporal characteristics of input sequences, specifically using an imaging technique (Gramian Angular Field (GAF)) to enrich the input's expressiveness.
C2. Extensive Evaluation and Detailed Explanations: The authors perform a thorough evaluation of AICHRONOLENS using real-world datasets and various LSTM models. This evaluation demonstrates that AICHRONOLENS provides highly detailed explanations regarding model behavior, which are valuable for verifying model robustness and for ongoing monitoring.
C3. Reproducibility and Artifact Release: The study contributes to the research community by releasing the trained LSTM models and the AICHRONOLENS code, encouraging further research and reproducibility.
F1. Pinpointing Hyperparameter Differences: AICHRONOLENS, unlike legacy XAI tools, can identify differences in LSTM model behavior stemming from hyperparameter settings (e.g., learning rates). For instance, higher learning rates correlate with stronger, potentially non-linear, relationships between relevance scores and time series inputs, while lower learning rates show weaker or non-linear correlations.
F2. Relating Correlation Coefficients to Model Errors: The correlation coefficients generated by AICHRONOLENS possess geometrical properties that can be directly linked to model errors. The tool reveals the root causes of these errors, differentiating between poor model design and data that is inherently difficult to predict.
F3. Refining Training and Improving Performance: AICHRONOLENS can be effectively used to guide the refinement of model training. By identifying specific weaknesses, it enables targeted data augmentation and minor hyperparameter adjustments, leading to significant improvements in model performance (e.g., up to 32% reduction in MAE for specific error scenarios).

3.1. Foundational Concepts

To understand this paper, a reader needs to grasp several fundamental concepts:

Time Series Forecasting: This is the process of predicting future values of a variable based on historical, time-ordered observations. In mobile networks, this involves forecasting metrics like traffic load or user count over time. The paper defines this formally: given a sequence of past values $X_t = \{x_{t-n+1}, \ldots, x_t\}$ , the goal is to predict the future value $\hat{x}_{t+1} = F(X_t)$ , where $F$ is a prediction function (e.g., an LSTM model).
Long-Short Term Memory (LSTM): A type of recurrent neural network (RNN) specifically designed to handle sequential data and address the vanishing/exploding gradient problem common in traditional RNNs. LSTMs use internal "gates" (input, forget, output gates) to regulate the flow of information through their memory cells, allowing them to learn long-term dependencies in the data. This makes them particularly effective for time series forecasting.
Explainable Artificial Intelligence (XAI): A field of AI that aims to make AI models more transparent and understandable to humans. It focuses on providing insights into how and why an AI model arrives at a particular decision or prediction, rather than just providing the output. This is crucial for building trust, debugging, and ensuring ethical deployment of AI systems.
Pearson Correlation Coefficient ( $\rho$ ): A statistical measure that quantifies the linear relationship between two sets of data. It ranges from -1 to +1, where +1 indicates a perfect positive linear correlation, -1 indicates a perfect negative linear correlation, and 0 indicates no linear correlation. The paper uses this to assess the linear relationship between XAI relevance scores and the GAF representation of time series.
- Conceptual Definition: The Pearson correlation coefficient measures the strength and direction of a linear relationship between two variables. If one variable increases as the other increases, they have a positive correlation. If one variable decreases as the other increases, they have a negative correlation. If there's no consistent pattern, the correlation is near zero.
- Mathematical Formula: The Pearson correlation coefficient $r$ between two variables $X$ and $Y$ is defined as: $r = \frac{\sum_{i=1}^{m} (X_i - \bar{X})(Y_i - \bar{Y})}{\sqrt{\sum_{i=1}^{m} (X_i - \bar{X})^2} \sqrt{\sum_{i=1}^{m} (Y_i - \bar{Y})^2}}$
- Symbol Explanation:
  - $m$ : The number of data points (samples).
  - $X_i$ : The value of the first variable at data point $i$ .
  - $Y_i$ : The value of the second variable at data point $i$ .
  - $\bar{X}$ : The mean of the first variable.
  - $\bar{Y}$ : The mean of the second variable.
  - $\sum$ : Summation operator.
Gramian Angular Field (GAF): An imaging technique that transforms a 1D time series into a 2D image. It works by converting time series values into polar coordinates, where the value determines the angle and the time step determines the radius. A Gram matrix is then constructed from the angular sums/differences, encoding temporal correlations and local patterns (maxima/minima) into an image. This transformation allows ML models (especially those designed for images, like Convolutional Neural Networks) to potentially analyze time series data in a new way, and in this paper, provides an enriched representation for correlation.

3.2. Previous Works

The paper contextualizes its contribution by discussing existing XAI techniques and their limitations, particularly when applied to time series data.

Legacy XAI Techniques:
- LayeR-wise backPropagation (LRP) [21]: A model-specific XAI technique that assigns a relevance score to each input feature, indicating its contribution to the model's output. It works by propagating the prediction relevance backward through the neural network layers, following a conservation principle (total relevance is maintained). When it reaches the input layer, it distributes the total relevance among the input elements. This provides insights into which parts of the input data are most influential for a given prediction.
- SHapely Additive exPlanations (SHAP) [22]: A model-agnostic XAI technique based on cooperative game theory. It calculates Shapley values for each feature, representing the average marginal contribution of that feature across all possible coalitions (combinations) of features. SHAP can provide both local (explaining a single prediction) and global (explaining overall model behavior) explanations.
- Local Interpretable Model-agnostic Explanations (LIME) [23]: Another model-agnostic XAI technique that explains individual predictions of any classifier. It approximates the complex black-box model locally with a simpler, interpretable model (e.g., linear model) around the prediction of interest. It perturbs the input, observes changes in the output, and learns a local linear model to explain feature importance.
- DeepLIFT [24]: A model-specific XAI technique for deep learning models that attributes contributions to input neurons based on comparing the activation of each neuron to its "reference" activation. It back-propagates activation differences from the output to the input layer.
- Eli5 [35]: A Python library that helps debug machine learning classifiers and explain their predictions. It supports various ML frameworks and provides XAI functionalities, often using techniques like permutation importance or LIME under the hood.
XAI for Time Series: The paper notes that XAI techniques were primarily developed for computer vision and NLP. While some adaptations exist for time series, especially for classification [32], [33], they often fall short for forecasting, particularly for univariate time series. The paper highlights a key limitation: even adapted methods like LRP and SHAP fail to provide useful insights beyond simple input relevance, often giving ambiguous explanations with no clear relation to the temporal input sequence.
Time Series to Image Transformations:
- Recurrence Plots (RP): A technique that visualizes the recurrence of states in a dynamical system. It plots a 2D matrix where points are marked if the system's state at two different times is "close" in phase space. The paper mentions RP are limited by variable length and scale issues and cannot effectively represent trends.
- Markov Transition Field (MTF): Transforms a time series into an image by encoding the transition probabilities between different quantile bins of the time series. While it preserves temporal dependencies like GAF, MTF cannot reconstruct the original time series, unlike GAF.

3.3. Technological Evolution

The field of AI in mobile networks has rapidly evolved, driven by the increasing complexity and data volumes of 5G and future 6G networks. Initially, simple statistical models were used for traffic prediction. With the rise of deep learning, LSTMs became prominent due to their ability to handle sequential data and capture complex temporal patterns, leading to higher quality predictions [54]. Concurrently, the demand for transparent AI led to the development of XAI. However, XAI techniques largely originated in computer vision (e.g., explaining why a CNN classifies an image as a "cat" by highlighting pixels) and NLP (e.g., explaining why an RNN predicts a word by highlighting previous words). The challenge emerged when applying these XAI methods to time series, where the "semantics" are less intuitive (e.g., what does "this pixel is important" mean in a time series image?), and the temporal dependencies are paramount. This paper's work fits into this timeline by bridging the gap between mature LSTM forecasting, advanced XAI principles, and the unique temporal characteristics of mobile network time series data, specifically addressing the interpretability shortcomings that hinder real-world deployment.

3.4. Differentiation Analysis

Compared to existing XAI methods for time series, AICHRONOLENS offers several core differences and innovations:

Addressing Ambiguity: The primary differentiation is AICHRONOLENS's ability to resolve the ambiguity of legacy XAI techniques. As shown in Section II-B, LRP and SHAP can assign similar relevance scores to vastly different input sequences, making their explanations uninformative. AICHRONOLENS explicitly tackles this by integrating temporal context.
Linking XAI with Temporal Characteristics: Unlike traditional XAI methods that only provide "input relevance" scores, AICHRONOLENS actively links these scores with the temporal properties of the input sequence. It uses GAF to transform the 1D time series into a 2D representation, revealing pairwise relationships and spatial distances between local maxima/minima.
Enriched Input Expressiveness: By correlating XAI relevance scores with the GAF representation, AICHRONOLENS gains a richer understanding of why certain inputs are relevant, rather than just that they are relevant. This allows it to identify if the model gives higher or lower importance to specific temporal features like local maxima or minima.
Diagnosis of Error Causes: AICHRONOLENS goes beyond identifying relevant inputs to diagnose the hidden causes of model errors. It can distinguish errors due to poor model design (e.g., lack of training data for specific trends) from errors due to inherently unpredictable data. This level of diagnostic capability is largely absent in existing XAI for time series.
Geometry of Explanations: The output of AICHRONOLENS (correlation matrices) can be geometrically interpreted (e.g., "triangle shapes") to reveal trends in prediction and identify problematic transitions, providing a novel visual and quantitative way to understand model behavior over time.
Offline Model Inspection for Online Monitoring: It provides a tool for offline model inspection to synthesize tailored explanations that can then be used for online monitoring, offering a practical pathway for deployment.

4. Methodology

4.1. Principles

The core idea behind AICHRONOLENS is to enhance the depth and utility of explanations provided by existing XAI tools for time series forecasting. It achieves this by addressing the inherent ambiguity of these tools, which often assign similar relevance scores to diverse input sequences without considering their temporal structure. The theoretical basis is that by explicitly linking the XAI relevance scores with an enriched, temporally-aware representation of the input time series, one can reveal deeper insights into the model's decision-making process, diagnose errors, and ultimately improve model design. This linkage is established using the Pearson correlation coefficient between XAI relevance scores and a Gramian Angular Field (GAF) representation of the input.

4.2. Core Methodology In-depth (Layer by Layer)

AICHRONOLENS is structured into four main modules, as illustrated in Figure 3. Each module performs a specific task to progressively enrich the understanding of the LSTM model's behavior.

Fig. 3. AICHRONOLENS architecture 该图像是图3，AICHRONOLENS架构的示意图，展示了时间序列输入 $X_t$ 经过GAF转换、XAI处理及LSTM模型预测的过程，最后由Analyzer分析反馈，整合了时间特性与解释性方法。

Fig. 3. AICHRONOLENS architecture

The process begins with the Time Series Input ( $X_t$ ) which is fed into the LSTM Model for forecasting ( $\hat{x}_{t+1}$ ). Simultaneously, this Time Series Input and the LSTM Model's predictions are processed by AICHRONOLENS modules.

The design of AICHRONOLENS adheres to two principles:

$DP_1$ : XAI Generality. AICHRONOLENS is designed to be pluggable with any existing XAI tool, making it versatile and allowing for comparative analysis of different XAI explanations for the same LSTM model.
$DP_2$ : LSTM Specificity. While general in its XAI pluggability, AICHRONOLENS initially focuses on LSTM models for univariate time series. Adaptation for spatio-temporal inputs is left for future work.

Let's detail each module:

4.2.1. Relevance Scores from XAI ( $\bullet$ )

This module computes relevance scores ( $L_n$ ) for each element of the input sequence $X_t = \{x_1, x_2, \ldots, x_n\}$ , indicating its contribution to the forecast $\hat{x}_{t+1}$ . The paper describes how this is done using two prominent XAI techniques: LRP and SHAP.

LRP (LayeR-wise backPropagation): LRP calculates relevance scores by tracking back the individual activation of each neuron and its contribution (weighted by connection strength) from the output layer to the input layer. This process ensures a conservation principle, meaning the total relevance in one layer is preserved in the next as it's propagated backward. The relevance $L_{i,j}^{(q)}$ of a neuron $i$ in layer $q$ to a neuron $j$ in layer $p$ (where $p$ is a subsequent layer) is formally defined as: $L _ { i j } ^ { ( q ) } = L _ { j } ^ { ( p ) } \sum _ { i , j } { \frac { a _ { i } \cdot w _ { i , j } } { \sum _ { k } a _ { k } \cdot w _ { k , j } } }$
- Symbol Explanation:
  - $L_{i,j}^{(q)}$ : Relevance score of neuron $i$ in layer $q$ with respect to neuron $j$ in layer $p$ .
  - $L_j^{(p)}$ : Total relevance of neuron $j$ in layer $p$ .
  - $a_i$ : Activation of neuron $i$ .
  - $w_{i,j}$ : Weight of the connection between neuron $i$ and neuron $j$ .
  - $\sum_{k} a_k \cdot w_{k,j}$ : Sum of activations weighted by connections for all neurons $k$ in layer $q$ connected to neuron $j$ in layer $p$ . This calculation is performed iteratively layer by layer until relevance is distributed across the input layer elements, yielding $L_n$ .
SHAP (SHapley Additive exPlanations): SHAP computes relevance scores (Shapley values) by determining the average marginal contribution of each input sequence element ( $x_i \in X_t$ ) across all possible permutations of feature presence or absence. The relevance score $l_i \in L_n$ for an input feature $i$ is calculated as: $l _ { i } ( f ) = \frac { 1 } { ( n - 1 ) ! } \sum _ { k = 1 } ^ { n - 1 } \sum _ { X _ { s } \subseteq X _ { t } \atop | s | = k } \left[ { \binom { n - 1 } { k } } \right] ^ { - 1 }$
- Symbol Explanation:
  - $l_i(f)$ : Shapley value (relevance score) for feature $i$ using prediction function $f$ .
  - $n$ : Total number of features in the input sequence $X_t$ .
  - $k$ : Number of features present in a subset $X_s$ .
  - $X_s$ : A subset of the input features $X_t$ .
  - $|s|$ : Cardinality (number of features) in the subset $X_s$ .
  - $\binom{n-1}{k}^{-1}$ : Inverse of the binomial coefficient, representing the number of ways to choose $k$ features from n-1 features. This accounts for the number of permutations.
  - $f(X_s)$ : Model prediction with the subset of features $X_s$ .
  - $X_s = X_t \setminus \{x_i\}$ : Represents the subset $X_s$ where feature $x_i$ is excluded. The full term within the square brackets is typically $f(X_s \cup \{x_i\}) - f(X_s)$ , representing the marginal contribution of $x_i$ to the prediction when $X_s$ is already present. Note: The provided formula in the paper seems incomplete here, as a typical Shapley value calculation involves the difference $f(S \cup \{i\}) - f(S)$ . Assuming the standard definition, the summation should be over these marginal contributions. The paper implies that $f(X_s)$ is the model prediction with $X_s$ , and $f(X_t)$ is the prediction with all features $\hat{x}_{t+1}$ .

4.2.2. Imaging via GAF ( $\pmb{\theta}$ )

This module transforms the 1D input time series $X_t$ into a 2D image representation using the Gramian Angular Field (GAF) technique. GAF is chosen because it preserves temporal dependencies, allows reconstruction of the original time series, and captures complex temporal correlations, unlike Recurrence Plots (RP) or Markov Transition Field (MTF).

The transformation process involves three steps:

Rescaling: The original elements $x_i \in X_t$ (for $i=1, \ldots, n$ ) are first rescaled to the range $[-1, 1]$ : $\widetilde { x } _ { i } = \frac { ( x _ { i } - \operatorname* { m a x } ( X _ { t } ) ) + ( x _ { i } - \operatorname* { m i n } ( X _ { t } ) ) } { ( \operatorname* { m a x } ( X _ { t } ) - \operatorname* { m i n } ( X _ { t } ) ) }$
- Symbol Explanation:
  - $\widetilde{x}_i$ : The rescaled value of $x_i$ .
  - $x_i$ : The original value of the $i$ -th element in the input sequence $X_t$ .
  - $\operatorname*{max}(X_t)$ : The maximum value in the input sequence $X_t$ .
  - $\operatorname*{min}(X_t)$ : The minimum value in the input sequence $X_t$ .
Polar Coordinates Transformation: The rescaled values $\widetilde{x}_i$ are then converted into polar coordinates, where the value itself dictates the angle and the time step dictates the radius: $\left\{ \begin{array} { l l } { \displaystyle \phi = \operatorname { a r c c o s } ( \widetilde { x } _ { i } ) , - 1 \leq \widetilde { x } _ { i } \leq 1 , \widetilde { x } _ { i } \in \widetilde { X } ; } \\ { \displaystyle r = \frac { i } { Y } , i \in \mathbb { N } . } \end{array} \right.$
- Symbol Explanation:
  - $\phi$ : The angular coordinate, derived from the arccosine of the rescaled value.
  - $\operatorname{arccos}(\widetilde{x}_i)$ : The arccosine function applied to the rescaled value $\widetilde{x}_i$ , which maps values in $[-1, 1]$ to angles in $[0, \pi]$ .
  - $r$ : The radial coordinate, determined by the time step $i$ .
  - $i$ : The index of the time step (from 1 to $n$ ).
  - $Y$ : A regularization factor for the span of the polar coordinate system. This transformation is bijective (original time series can be recovered) and preserves absolute temporal relations.
Gramian Angular Field (GAF) Matrix Construction: A Gram matrix $\mathbf{G}_{n \times n}$ is constructed, where each element is the cosine of the sum of two angles from the polar coordinate representation. This matrix encodes pairwise relationships between points in the time series. $\mathbf { G } _ { n \times n } = \left[ \begin{array} { c c c c } { \cos ( \phi _ { 1 } + \phi _ { 1 } ) } & { \cdot \cdot \cdot } & { \cos ( \phi _ { 1 } + \phi _ { n } ) } \\ { \cos ( \phi _ { 2 } + \phi _ { 1 } ) } & { \cdot \cdot } & { \cos ( \phi _ { 2 } + \phi _ { n } ) } \\ { \vdots } & { \cdot } & { \vdots } \\ { \cos ( \phi _ { n } + \phi _ { 1 } ) } & { \cdot \cdot } & { \cos ( \phi _ { n } + \phi _ { n } ) } \end{array} \right]$
- Symbol Explanation:
  - $\mathbf{G}_{n \times n}$ : The Gramian Angular Field matrix of dimensions $n \times n$ , where $n$ is the length of the input time series.
  - $\phi_i$ : The angular coordinate corresponding to the $i$ -th time step.
  - $\cos(\cdot)$ : The cosine function. Alternatively, using an inner product definition: $\langle v , z \rangle = v \cdot z - { \sqrt { 1 - v ^ { 2 } } } \cdot { \sqrt { 1 - z ^ { 2 } } }$
- Symbol Explanation:
  - $\langle v, z \rangle$ : The defined inner product between two values $v$ and $z$ .
  - v, z: Two values (e.g., $\widetilde{x}_i, \widetilde{x}_j$ ) from the rescaled time series.
  - $\sqrt{\cdot}$ : Square root function. The $\mathbf{G}_{n \times n}$ matrix can be rewritten using this inner product: $\mathbf { G } _ { n \times n } = \left[ \begin{array} { c c c } { \langle \widetilde { x } _ { 1 } , \widetilde { x } _ { 1 } \rangle } & { \cdots } & { \langle \widetilde { x } _ { 1 } , \widetilde { x } _ { n } \rangle } \\ { \langle \widetilde { x } _ { 2 } , \widetilde { x } _ { 1 } \rangle } & { \cdots } & { \langle \widetilde { x } _ { 2 } , \widetilde { x } _ { n } \rangle } \\ { \vdots } & { \ddots } & { \vdots } \\ { \langle \widetilde { x } _ { n } , \widetilde { x } _ { 1 } \rangle } & { \cdots } & { \langle \widetilde { x } _ { n } , \widetilde { x } _ { n } \rangle } \end{array} \right]$
- Symbol Explanation:
  - $\mathbf{G}_{n \times n}$ : The Gramian Angular Field matrix.
  - $\langle \widetilde{x}_i, \widetilde{x}_j \rangle$ : The inner product (as defined above) between the rescaled values at time steps $i$ and $j$ . The GAF matrix has properties: it preserves temporal dependency (time increases from top-left to bottom-right), encodes temporal correlations, and its main diagonal contains the original time series values. High values in GAF (near 1) indicate correlations between local maxima or minima; values near 0 indicate correlations between extrema and intermediate points; negative values (near -1) indicate correlations between a local maximum and a local minimum.

4.2.3. Defining Correlations ( $\pmb{\otimes}$ )

This module seeks to establish a linear relationship between the XAI relevance scores ( $L_n$ ) and the GAF representation ( $\mathbf{G}_{n \times n}$ ) of the input time series. Each row $G_i$ of the $\mathbf{G}_{n \times n}$ matrix is a $1 \times n$ vector that characterizes inner relationships between samples of the input time series. Since $L_n$ is also a $1 \times n$ vector, the Pearson correlation coefficient can be computed between $L_n$ and each row $G_i$ .

The correlation vector $R_n$ is defined as: $R _ { n } = { \frac { \operatorname { c o v } ( G , L ) } { \sigma _ { G } \sigma _ { L } } } = { \left[ \begin{array} { l } { \rho _ { 0 } } \\ { \rho _ { 1 } } \\ { \vdots } \\ { \rho _ { n } } \end{array} \right] }$

Symbol Explanation:
- $R_n$ : The correlation vector, containing Pearson correlation coefficients for each row of GAF against LRP/SHAP scores.
- $\operatorname{cov}(G, L)$ : The covariance between a row $G_i$ of the GAF matrix and the XAI relevance scores $L_n$ .
- $\sigma_G$ : The standard deviation of the row $G_i$ of the GAF matrix.
- $\sigma_L$ : The standard deviation of the XAI relevance scores $L_n$ .
- $\rho_i$ : The Pearson correlation coefficient between the $i$ -th row of the GAF matrix ( $G_i$ ) and the XAI relevance scores ( $L_n$ ). This process is repeated for each timestep $t = 1, \ldots, T$ , resulting in a correlation matrix $\mathbf{C}$ with dimensions $n \times T$ : $\mathbf { C } = \left[ \begin{array} { c c c c } { \rho _ { 1 , 1 } } & { \rho _ { 1 , 2 } } & { \ldots } & { \rho _ { 1 , T } } \\ { \rho _ { 2 , 1 } } & { \rho _ { 2 , 2 } } & { \ldots } & { \rho _ { 2 , T } } \\ { \vdots } & { \vdots } & { \ddots } & { \vdots } \\ { \rho _ { n , 1 } } & { \rho _ { n , 2 } } & { \ldots } & { \rho _ { n , T } } \end{array} \right] _ { n \times T }$
Symbol Explanation:
- $\mathbf{C}$ : The correlation matrix, storing all correlation vectors over time.
- $n$ : Length of the input history (number of samples in $X_t$ ).
- $T$ : Total number of timesteps for which predictions are made.
- $\rho_{i,t}$ : The Pearson correlation coefficient for the $i$ -th row of GAF at timestep $t$ .

4.2.4. Analyzing Correlations ( $\pmb{\bigcirc}$ )

The "Analyzer" module is the heart of AICHRONOLENS, synthesizing explanations from the correlation matrix $\mathbf{C}$ . To observe the evolution of Pearson's coefficients over time for each sample in the input history, a new matrix $\mathbf{S}$ is created by storing the secondary diagonals of $\mathbf{C}$ in rows. For example, given a correlation matrix $\mathbf{C}_{n \times T}$ where $n=6$ and $T=3$ : $\mathbf { C } _ { 6 \times 3 } = \left[ \begin{array} { c c c } { \rho _ { 1 , 1 } } & { \rho _ { 1 , 2 } } & { \rho _ { 1 , 3 } } \\ { \rho _ { 2 , 1 } } & { \rho _ { 2 , 2 } } & { \rho _ { 2 , 3 } } \\ { \rho _ { 3 , 1 } } & { \rho _ { 3 , 2 } } & { \rho _ { 3 , 3 } } \\ { \rho _ { 4 , 1 } } & { \rho _ { 4 , 2 } } & { \rho _ { 4 , 3 } } \\ { \rho _ { 5 , 1 } } & { \rho _ { 5 , 2 } } & { \rho _ { 5 , 3 } } \\ { \rho _ { 6 , 1 } } & { \rho _ { 6 , 2 } } & { \rho _ { 6 , 3 } } \end{array} \right]$ The matrix $\mathbf{S}$ would be formed by secondary diagonals, tracking the influence of specific input elements as they "age" within the history window. For practical use, $\mathbf{S}^T$ (transpose of $\mathbf{S}$ ) is often more convenient. The example given in the paper for $\mathbf{S}_{4 \times 3}$ seems to imply a different construction or a specific portion of $\mathbf{C}$ is used to highlight the secondary diagonals: $\mathbf { S } _ { 4 \times 3 } = \left[ \begin{array} { c c c } { \rho _ { 3 , 1 } } & { \rho _ { 2 , 2 } } & { \rho _ { 1 , 3 } } \\ { \rho _ { 4 , 1 } } & { \rho _ { 3 , 2 } } & { \rho _ { 2 , 3 } } \\ { \rho _ { 5 , 1 } } & { \rho _ { 4 , 2 } } & { \rho _ { 3 , 3 } } \\ { \rho _ { 6 , 1 } } & { \rho _ { 5 , 2 } } & { \rho _ { 4 , 3 } } \end{array} \right]$

Symbol Explanation:
- $\mathbf{S}$ : A matrix constructed from the secondary diagonals of $\mathbf{C}$ , where each row tracks the evolution of a specific input's correlation over time. For example, the first row $\begin{bmatrix} \rho_{3,1} & \rho_{2,2} & \rho_{1,3} \end{bmatrix}$ tracks elements that appear in position 3, then 2, then 1 of the input window.
- $\rho_{i,t}$ : Correlation coefficient for the $i$ -th row of GAF at timestep $t$ . This implies that elements "age out" of the input window, and their correlation coefficient eventually vanishes.

The analysis of $\mathbf{C}$ or $\mathbf{S}^T$ over specific time windows ( $w \leq T$ ) is central to synthesizing explanations. The correlation values (positive or negative) generate "triangle shapes" within these matrices. These triangles represent the prediction trend given the time series input. AICHRONOLENS focuses on transitions between these triangles:

Smooth transitions: Indicate that the model effectively captures changes in the data trend.
Non-smooth transitions: Often associated with model errors, indicating the model is failing to adapt to changes in the trend.

The geometric interpretation of these correlation patterns allows AICHRONOLENS to identify different causes of errors, which are further explored in the experimental section.

5. Experimental Setup

5.1. Datasets

The study uses two distinct real-world mobile network datasets:

$D_1$ (Traffic Load):
- Source: Measurements of traffic volumes collected from a live 4G network.
- Scale & Characteristics: Covers a large metropolitan region in Europe over a period of 3 months. It provides fine-grained information on traffic volumes at each Base Station (BS) with a 3-minute granularity.
- Domain: Mobile network traffic forecasting.
- Data Sample: An example data sample would be a sequence of numerical values representing traffic load (e.g., in GB/min) for a specific BS over 3-minute intervals: $[10.5, 11.2, 9.8, 10.1, ..., 7.5] GB/min$ .
- Rationale: This dataset is suitable for analyzing the model's ability to forecast continuous traffic volumes and how AICHRONOLENS can capture variations due to hyperparameters in this common mobile network task.
$D_2$ (Number of Connected Users):
- Source: Estimated number of active users connected to production BSs. Data was collected using an LTE passive monitoring tool that decodes unencrypted information exchanged between BSs and users.
- Scale & Characteristics: Contains millisecond-level information about temporary user IDs (RNTI) and scheduling. This raw data is then processed using the methodology from [45] to estimate the number of active users every 6 minutes.
- Domain: Mobile network user count forecasting.
- Data Sample: A sample would be a sequence of integers representing the number of active users at a BS at 6-minute intervals: [150, 155, 148, 162, ..., 130] active users.
- Rationale: This dataset provides a different type of mobile network metric (discrete user count vs. continuous traffic load) and allows AICHRONOLENS to be validated on a distinct forecasting problem, particularly for identifying errors specific to data characteristics.

5.2. Evaluation Metrics

The paper uses several evaluation metrics, primarily focusing on Mean Absolute Error (MAE) for model training and assessment, and Silhouette Score for clustering analysis.

Mean Absolute Error (MAE):
- Conceptual Definition: MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between the predicted values and the actual values. MAE is robust to outliers compared to MSE because it does not square the errors.
- Mathematical Formula: $\mathrm{MAE} = \frac{1}{N} \sum_{i=1}^{N} |y_i - \hat{y}_i|$
- Symbol Explanation:
  - $N$ : The total number of observations or data points.
  - $y_i$ : The actual (true) value for the $i$ -th observation.
  - $\hat{y}_i$ : The predicted value for the $i$ -th observation.
  - $| \cdot |$ : Absolute value.
  - $\sum$ : Summation operator.
Mean Squared Error (MSE):
- Conceptual Definition: MSE measures the average of the squares of the errors. It calculates the average squared difference between the predicted values and the actual values. MSE penalizes larger errors more heavily than MAE due to the squaring operation, making it sensitive to outliers.
- Mathematical Formula: $\mathrm{MSE} = \frac{1}{N} \sum_{i=1}^{N} (y_i - \hat{y}_i)^2$
- Symbol Explanation:
  - $N$ : The total number of observations or data points.
  - $y_i$ : The actual (true) value for the $i$ -th observation.
  - $\hat{y}_i$ : The predicted value for the $i$ -th observation.
  - $(\cdot)^2$ : Squaring operation.
  - $\sum$ : Summation operator.
Silhouette Score:
- Conceptual Definition: The silhouette score is used to evaluate the quality of clusters created by a clustering algorithm. It measures how similar an object is to its own cluster (cohesion) compared to other clusters (separation). A high silhouette score indicates that the object is well matched to its own cluster and poorly matched to neighboring clusters. Scores range from -1 (poor clustering) to +1 (dense, well-separated clustering), with 0 indicating overlapping clusters.
- Mathematical Formula: For a single data point $i$ : $s(i) = \frac{b(i) - a(i)}{\max(a(i), b(i))}$ The overall silhouette score for a clustering is the average s(i) over all data points.
- Symbol Explanation:
  - s(i): The silhouette score for data point $i$ .
  - a(i): The average distance from data point $i$ to all other points in the same cluster. This measures cohesion.
  - b(i): The minimum average distance from data point $i$ to all points in a different cluster (i.e., the nearest cluster that $i$ is not a part of). This measures separation.
  - $\max(\cdot, \cdot)$ : The maximum function.

5.3. Baselines

The paper's primary focus is on AICHRONOLENS as a tool for interpreting and improving existing LSTM models, rather than proposing a new forecasting model to compete with other forecasting techniques. Therefore, its "baselines" are implicitly:

Legacy XAI Tools: AICHRONOLENS is compared against the limitations of LRP and SHAP directly, demonstrating how it provides deeper, less ambiguous insights than these tools alone. The clustering analysis in Section II-B (Figure 2) explicitly shows the inadequacy of LRP explanations without AICHRONOLENS's enhancement.
Variations of LSTM Models: The paper evaluates AICHRONOLENS on different LSTM models with varying hyperparameters (number of neurons, learning rate, dropout) to show its generality and ability to identify performance differences. For instance, the "baseline model A_A" is compared against an "optimized model" (derived through AICHRONOLENS diagnosis) for performance improvement.

The overall LSTM model architecture used for both datasets is:
Layers: One unidirectional LSTM layer, followed by an output layer.
Output Layer: Single neuron with a linear activation function for one-step prediction.
Input Sequence Length: Predicts the next value based on a history of 20 past samples.
Optimizer: Adam optimizer.
Loss Function: Mean Absolute Error (MAE).
Data Split: Standard 80:20 train-test split.

Specifically for $D_1$ , six different LSTM models were trained with intentional variations in configuration: The following are the results from Table I of the original paper:

MODEL ID	Neurons	LEARNING RATE	MAE
A	200	0.0001	0.96
B	100	0.0001	0.99
C	50	0.0001	1.09
A_A	200	0.001	0.67
B_B	100	0.001	0.68
c_c	50	0.001	0.95

For $D_2$ , a single optimized model was used:

LSTM Layer: 25 neurons with a tanh activation function.

6. Results & Analysis

6.1. Core Results Analysis

The extensive evaluations with real-world mobile traffic data (datasets $D_1$ and $D_2$ ) demonstrate AICHRONOLENS's ability to provide highly detailed explanations that surpass the capabilities of legacy XAI tools. The key findings reveal how AICHRONOLENS pinpoints model behaviors, diagnoses error causes, and guides model optimization.

6.1.1. Finding $R_1$ : Pinpointing Temporal Characteristics

AICHRONOLENS effectively identifies specific temporal characteristics in the input sequence that stimulate the model, an insight largely missed by LRP and SHAP alone. While traditional XAI methods might assign similar relevance scores over time, the correlation vectors produced by AICHRONOLENS (linking XAI scores with GAF representation) clearly show temporal variations.

As qualitatively demonstrated in Figure 4, SHAP scores might consistently highlight recent samples as relevant. However, AICHRONOLENS's correlation vectors reveal deeper insights:

When samples entering or leaving the input sequence are not particularly relevant (e.g., intermediate values), the correlation between SHAP and GAF might be weak or almost non-existent (e.g., window 20).
When a significant temporal feature, such as a new local minimum, enters the input sequence, AICHRONOLENS captures a substantial modification in the correlation vector. This indicates an alignment between the XAI's perceived relevance and the actual temporal saliency highlighted by GAF.

This ability to detect such changes is crucial because being "blind" to these temporal shifts can be detrimental to model performance, as the paper argues.

该图像是论文中图4的多子图复合图，展示了窗口19到22的负载时间序列、对应的SHAP得分、GASF矩阵和相关向量。负载曲线的局部最大值和最小值用红色方块和蓝色圆点标注，揭示了时间序列的关键特征及其对模型预测的影响。

Fig. 4. A detailed look at AICHRoNoLENs. Red squares and blue dots represent the local maxima and local minima respectively.

6.1.2. Finding $R_2$ : Spotting Categories of Errors

AICHRONOLENS quantifies and differentiates between two categories of errors: $E_1$ (poor model design) and $E_2$ (data inherently hard to predict).

Analysis of $E_1$ (Poor Model Design): AICHRONOLENS enables tracing the root cause of errors, revealing weaknesses in model design not evident from coarse metrics like MAE. By analyzing the correlation matrix $\mathbf{C}$ , the presence of trend changes in the time series is characterized by triangles of negative correlation followed by positive correlation.

Sharp Triangles: Well-formed, sharply outlined triangles (e.g., Figure 5a, top) indicate that the model performs well and makes few mistakes in that part of the time series (Figure 5a, bottom).
Non-Sharp Triangles: Noisy, non-sharp triangles (e.g., Figure 5b, top) are associated with high errors, particularly during abrupt falls in traffic where the model struggles to predict the decrease accurately (Figure 5b, bottom). This behavior is consistently observed across decreasing slopes in the test set.

该图像是图表，展示了图6中模型误差根因分析的四个子图。包括误差与锐度评分、负载、负载变化差异的关系，以及训练集流量值分布，帮助理解误差产生的因素。

To quantitatively identify these triangles, a pattern recognition technique is introduced:

Transition Detection: Identify transitions between triangles by computing the difference of median correlation scores between consecutive correlation vectors $G_t$ and $G_{t+1}$ .
Windowed Observation: Once a column interrupting a triangle is found, a window $w$ (e.g., $w=6$ ) centered on this column is used to observe 3 preceding and 3 succeeding columns, forming a sub-matrix $\mathbf{C}_{n \times w}$ .
Binarization: Each element $c_{i,j}$ in $\mathbf{C}_{n \times w}$ is binarized into $\overline{c}_{i,j}$ : $\overline { { c } } _ { i , j } = \left\{ \begin{array} { l l } { - 1 } & { \mathrm { i f ~ } - 0 . 9 \leq c _ { i , j } \leq 0 } \\ { 1 } & { \mathrm { i f ~ } 0 \leq c _ { i , j } \leq 0 . 9 . } \end{array} \right.$
- Symbol Explanation:
  - $\overline{c}_{i,j}$ : The binarized correlation value.
  - $c_{i,j}$ : The original correlation value from matrix $\mathbf{C}$ .
  - -1: Assigned if the correlation is negative (between -0.9 and 0).
  - 1: Assigned if the correlation is positive (between 0 and 0.9). This binarization helps in identifying strong positive or negative correlations within a specific range.
Sharpness Score Calculation: For the resulting binarized matrix $\overline{\mathbf{C}}_{n \times w}$ , the number of positive and negative values ( $h$ ) for each secondary diagonal of length $w$ is computed and stored in an array. A sharpness score $\sigma$ is then computed: $\sigma = 1 - \frac { \sum _ { i = 1 } ^ { n - ( w - 1 ) } h _ { i } } { \left| h _ { i } \right| \cdot \left( w + 1 \right) }$
- Symbol Explanation:
  - $\sigma$ : The sharpness score.
  - $n$ : Length of the input history.
  - $w$ : Window size for observation.
  - $h_i$ : The number of positive or negative values on the $i$ -th secondary diagonal.
  - $\left| h_i \right|$ : Absolute value of $h_i$ . The sharpness score ranges such that $0 < \sigma < 1$ indicates higher non-sharpness (higher values mean more non-sharpness), and $-1 < \sigma < 0$ indicates higher sharpness (lower values mean more sharpness).

The analysis reveals a direct relationship: as the sharpness score $\sigma$ increases (indicating more non-sharpness), the error also increases (Figure 6a). Examination of absolute errors across the test set (Figure 6b) shows that the highest errors (5-8 GB/min) occur during moderate to low loads associated with abrupt falls. Figure 6c further illustrates that the model significantly underestimates the ground truth during severe load decreases. A critical insight comes from Figure 6d, which shows a lack of training samples in the training set for precisely these traffic volumes experiencing abrupt falls. This indicates that the model's poor generalization during these events is due to insufficient training data for such specific trends.

Fig. 6. Root cause analysis of model errors 该图像是图表，展示了图6中模型误差根因分析的四个子图。包括误差与锐度评分、负载、负载变化差异的关系，以及训练集流量值分布，帮助理解误差产生的因素。

Fig. 6. Root cause analysis of model errors

Model Re-design and Performance Improvement: AICHRONOLENS thus points to data augmentation as a solution. By augmenting the training dataset with more samples featuring abrupt load decreases, and introducing a sigmoid activation function before the output layer, a new optimized model is trained (starting from model A_A settings). This optimized model significantly outperforms the baseline (model A_A), especially in these challenging scenarios.

Fig. 7. Error of baseline and optimized models after AICHRoNoLENS diagnosis 该图像是图表，展示了AICHRoNoLENS诊断后基线模型和优化模型的误差分布对比，横轴为误差（GB/min），纵轴为出现频率，优化模型误差集中度更高，表现更优。

Fig. 7. Error of baseline and optimized models after AICHRoNoLENS diagnosis

Figure 7 shows that the optimized model not only reduces errors of high magnitude (tails of the error distribution) but also centers the error distribution around zero and reduces the frequency of small-magnitude errors. For windows around abrupt load decreases, the MAE of model A_A was 0.921, while the optimized model achieved an MAE of 0.619, representing a 32% improvement. Over the entire test set, the optimized model achieved an MAE of 0.69, a 2% improvement over A_A (0.67). This highlights AICHRONOLENS's power in diagnosing specific weaknesses and guiding targeted improvements.

Analysis of $E_2$ (Data-Specific Errors): Even after addressing model design issues ( $E_1$ ), AICHRONOLENS can identify errors stemming from the inherent characteristics of the data itself. For dataset $D_2$ , AICHRONOLENS reveals sequences of high-magnitude errors that change sign (e.g., positive then negative). This behavior is characterized by correlation triangles that are interrupted by a full column of weak correlation.

该图像是包含三部分的图表，展示了时间步长区间内移动网络用户活跃数变化、历史数据的热力图及预测误差。红框突出显示了误差异常区间，体现了模型在该处表现波动，方便进一步深入分析时间序列模型行为。

Fig. 8. Analysis consecutive errors with high magnitude that change of sign error sign

To quantify this, the Euclidean distance $d(G_t, G_{t+1})$ between subsequent correlation vectors is computed and normalized to [0, 1]. When $d > 0.6$ , in 65% of cases, a change in error sign is observed, with a corresponding MAE of 0.46. This is significantly higher than the overall MAE of 0.13 for $D_2$ , indicating that AICHRONOLENS effectively pinpoints regions where the data itself is challenging for the model to predict consistently.

6.1.3. Finding $R_3$ : Impact of Learning Rates on Correlations

AICHRONOLENS provides insights into how different hyperparameter settings, specifically learning rates, influence the correlation patterns. Figure 9 qualitatively illustrates this:

Models with lower learning rates (e.g., 0.0001, models A, B, C) tend to exhibit strong positive or negative correlations, with values approaching 1 or -1. This suggests that the model's learned relationships are more pronounced and stable.
Models with higher learning rates (e.g., 0.001, models A_A, B_B, C_C) show correlation scores clustered around zero, indicating weaker or negligible correlations. This implies that while higher learning rates might enable faster adaptation to new conditions, they can lead to less stable or less discernible linear relationships between XAI scores and GAF representations.

该图像是两组热力图组成的图表，展示了不同时间步（Timestep）与历史长度（History）之间的相关性对比，分别对应模型A、B、C的结果，体现了模型随时间推移的关联变化。

Fig. 9. Correlation vector for models with different learning rates: top 0.0001 and bottom 0.001

This finding suggests that AICHRONOLENS can offer precise insights into the heterogeneous accuracy and learning behavior of models trained with different learning rates. Interestingly, the study also notes that for this specific dataset, the number of neurons (depth of LSTM architecture) played a marginal role in these observed correlation patterns.

6.2. Data Presentation (Tables)

The following are the results from Table I of the original paper:

MODEL ID	Neurons	LEARNING RATE	MAE
A	200	0.0001	0.96
B	100	0.0001	0.99
C	50	0.0001	1.09
A_A	200	0.001	0.67
B_B	100	0.001	0.68
c_c	50	0.001	0.95

This table summarizes the configurations and performance (MAE) of the six LSTM models trained for dataset $D_1$ . It shows that models with a higher learning rate (0.001) generally achieve lower MAE values, indicating better overall performance, which aligns with the observation that higher learning rates result in a model that adapts more rapidly.

6.3. Ablation Studies / Parameter Analysis

While the paper doesn't present traditional ablation studies (e.g., removing GAF or Pearson correlation), it effectively performs a parameter analysis by training multiple LSTM models with varying hyperparameters:

Number of Neurons: Models A, B, C (and A_A, B_B, C_C) explore 200, 100, and 50 neurons. The analysis in Finding $R_3$ suggests that, for the specific dataset, the number of neurons played a marginal role compared to the learning rate in affecting correlation patterns.
Learning Rate: Two distinct learning rates (0.0001 and 0.001) are tested. Finding $R_3$ explicitly analyzes the impact of learning rate on the correlation vectors, demonstrating that higher learning rates lead to weaker, more clustered-around-zero correlations, while lower learning rates result in stronger, more distinct positive/negative correlations. This indicates that AICHRONOLENS can differentiate how LSTMs with different learning strategies form internal representations and explanations.
Dropout: The models for $D_1$ explicitly include a dropout layer for regularization, showing the practical setup.

The "root cause analysis of model errors" (Finding $R_2$ ) also acts as a form of post-hoc parameter analysis by showing how AICHRONOLENS led to an informed model re-design. This involved:
Data Augmentation: Augmenting the training set to include more abrupt load decrease scenarios, addressing a data scarcity issue identified by AICHRONOLENS.
Activation Function Change: Introducing a sigmoid activation function before the output layer for the optimized model. This targeted intervention, guided by AICHRONOLENS's diagnostics, significantly improved performance (32% reduction in MAE for critical scenarios), demonstrating its utility in model development and hyperparameter tuning beyond simple trial-and-error.

7. Conclusion & Reflections

7.1. Conclusion Summary

This paper effectively addresses the critical challenge of making AI models, particularly LSTMs, interpretable for time series forecasting in mobile networks. It introduces AICHRONOLENS, a novel tool that significantly advances XAI for this domain. By uniquely linking legacy XAI relevance scores (from tools like LRP and SHAP) with the temporal characteristics of input sequences, specifically through the use of Gramian Angular Fields (GAF) and Pearson correlation, AICHRONOLENS provides a much deeper understanding of model behavior than previously possible. The extensive evaluations using real-world mobile traffic traces (D1 and D2) demonstrate its ability to pinpoint diverse categories of model errors, accurately trace their root causes (whether due to poor model design or inherently complex data), and subsequently guide model refinement. This diagnostic capability led to a notable performance improvement of up to 32% in specific challenging scenarios through targeted data augmentation and minor hyperparameter adjustments.

7.2. Limitations & Future Work

The authors explicitly state some limitations and suggest future work:

LSTM Specificity: Currently, AICHRONOLENS's scope is restrained to LSTM models and univariate time series.
Future Work - Spatio-temporal Inputs: The adaptation of AICHRONOLENS for models dealing with spatio-temporal inputs (e.g., forecasting traffic across multiple interconnected BSs over time) is left for future research. This would be a significant extension, as such data introduces additional complexity in representing and correlating features.

7.3. Personal Insights & Critique

The AICHRONOLENS framework presents a highly innovative and practical approach to XAI for time series, a domain often underserved by existing interpretability methods. The core insight—that raw XAI scores are insufficient without temporal context—is well-argued and empirically demonstrated.

Inspirations and Applications:

Practical Debugging: This tool is immensely valuable for ML engineers and network operators. Instead of blindly trusting LSTM predictions or struggling with vague XAI outputs, AICHRONOLENS provides actionable insights for debugging model failures. Identifying specific data patterns (e.g., abrupt falls) that cause errors and linking them to a lack of training data is a direct pathway to model improvement (e.g., through targeted data augmentation).
Model Robustness and Trust: By offering a deeper understanding of why a model predicts incorrectly, AICHRONOLENS can significantly contribute to building trust in AI systems deployed in critical infrastructure like mobile networks. This increased transparency is crucial for regulatory compliance and operational acceptance.
Beyond Mobile Networks: The methodology of linking XAI outputs with GAF (or other time series imaging techniques) and correlation analysis could be widely transferable to other time series forecasting domains, such as financial market prediction, energy consumption forecasting, or industrial anomaly detection. Any domain where LSTMs are used for time series and interpretability is a concern could potentially benefit.
Hyperparameter Tuning Insight: The finding that learning rates affect correlation patterns provides a novel perspective on hyperparameter tuning. It suggests that XAI metrics from AICHRONOLENS could be used not just for debugging, but also as a guide during model development to understand how different hyperparameter choices influence the model's internal reasoning and sensitivity to temporal features.

Potential Issues and Areas for Improvement:

Computational Cost: The process involves several steps: XAI calculation, GAF transformation, and Pearson correlation for each timestep. The paper mentions a 16-hour execution time for clustering (which AICHRONOLENS simplifies but doesn't eliminate the underlying computations), suggesting that for very long time series or real-time online diagnostics, the computational overhead might be considerable. While designed for offline inspection, optimizing this for faster feedback loops could be beneficial.
Generalizability of GAF: While GAF is a powerful tool, the effectiveness of time series to image transformations can sometimes depend on the specific characteristics of the time series. Further studies could explore if other imaging techniques (e.g., MTF in combination with GAF, or different GAF variants) or even direct statistical feature extraction methods could complement or replace GAF for certain data types.
Interpretability of GAF itself: While GAF provides an "enriched expressiveness," a novice might still find interpreting a GAF matrix challenging. The paper does a good job explaining its properties, but connecting the visual patterns in GAF directly to human-understandable temporal events still requires some expertise. The correlation vectors simplify this, but the underlying interpretation of the GAF itself is a prerequisite.
Complexity of Correlation Patterns: The "triangle shapes" and their transitions, while insightful, still require a human expert to analyze and interpret. While the paper provides a sharpness score, further automation in identifying and categorizing these patterns (e.g., using unsupervised learning on the $C$ or $S$ matrices) could make AICHRONOLENS even more user-friendly for non-experts.
Quantifying the "32% improvement": While a 32% improvement in MAE for specific error scenarios is significant, it's important to note that the overall MAE improved by a more modest 2%. This highlights that AICHRONOLENS excels at tackling specific types of errors, which is extremely valuable, but a nuanced understanding of its impact on overall model performance is necessary. This isn't a criticism but a clarification for interpreting the results.

Overall, AICHRONOLENS is a commendable step forward in making LSTMs for time series forecasting more transparent and controllable, particularly within the demanding context of mobile network operations. Its focus on linking explainability with temporal context is a crucial innovation.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.