Paper status: completed

GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling

Published:05/26/2025

Temporal Graph Modeling (1)Safety in LLM Multi-Agent Collaboration (1)Anomaly Detection in Multi-Agent Systems (1)Unsupervised Encoder-Decoder Architecture (1)Information Bottleneck-based Graph Abstraction (1)

Original Link PDF

Price: 0.100000

20 readers

This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

GUARDIAN safeguards LLM multi-agent collaborations by modeling interactions as temporal graphs. It uses an unsupervised encoder-decoder with incremental training and Information Bottleneck abstraction to precisely detect and mitigate hallucination amplification and error propagat

Abstract

The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration face critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm, learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization.

Mind Map

In-depth Reading

English Analysis~18 min read · 21,260 chars

1. Bibliographic Information

Title: GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Authors: Jialong Zhou (King's College London), Lichao Wang (Beijing Institute of Technology), Xiao Yang (Tsinghua University).
Journal/Conference: The paper is available on arXiv, which is a preprint server. This means it has not yet undergone formal peer review for a conference or journal publication at the time of this analysis.
Publication Year: The version analyzed was submitted to arXiv in 2024 (as per the PDF link), although some citations in the paper refer to 2025.
Abstract: The paper introduces GUARDIAN, a unified framework to detect and mitigate safety risks like hallucination amplification and error propagation in multi-agent collaborations powered by Large Language Models (LLMs). The core idea is to model the collaboration process as a temporal attributed graph, capturing how information (and misinformation) spreads over time. GUARDIAN uses an unsupervised encoder-decoder architecture with an incremental training approach to identify anomalous agents (nodes) and communications (edges) by learning to reconstruct normal interaction patterns. A key innovation is a graph abstraction mechanism based on Information Bottleneck Theory, which compresses the interaction graphs to retain only essential information, improving efficiency and robustness. Experiments show that GUARDIAN achieves state-of-the-art accuracy in safeguarding these systems while using resources efficiently.
Original Source Link: https://arxiv.org/pdf/2505.19234v1 (Preprint)

2. Executive Summary

Background & Motivation (Why):
- Core Problem: Multi-agent systems, where multiple LLM-powered agents collaborate to solve complex problems, are becoming increasingly popular. However, this collaboration introduces significant safety challenges. A single agent producing a factual error (a "hallucination") or being maliciously manipulated can spread misinformation throughout the entire system. This misinformation can be amplified as other agents build upon the faulty data, leading to a cascade of errors.
- Importance & Gaps: The problem is critical because the reliability of these systems depends on the integrity of their collaborative process. Prior solutions either focus on detecting errors in a single agent's output (ignoring the propagation dynamics) or use simplistic methods like majority voting that don't capture the complex dependencies between agents. Many existing methods also require modifying the underlying LLMs, which is not feasible for closed-source models like GPT-4.
- Innovation: GUARDIAN introduces a novel, model-agnostic approach. By representing the entire multi-turn collaboration as a dynamic graph that evolves over time, it can explicitly model and track the propagation of information. This allows it to detect anomalies not just in an agent's response, but also in the communication patterns themselves, providing a more holistic safety net.
Main Contributions / Findings (What):
- Temporal Graph Modeling: Proposes representing LLM multi-agent collaboration as a discrete-time temporal attributed graph, which effectively captures the dynamics of information flow and error propagation.
- Unsupervised Anomaly Detection: Develops a unified, unsupervised encoder-decoder framework to detect multiple safety issues (hallucination, agent-targeted attacks, communication-targeted attacks) without needing labeled data or modifications to the LLMs.
- Information Bottleneck Abstraction: Introduces a graph abstraction mechanism based on Information Bottleneck Theory to compress the interaction graph, filtering out noise and redundancy while preserving critical patterns for anomaly detection. This is supported by theoretical bounds on information flow.
- State-of-the-Art Performance: Experimental results demonstrate that GUARDIAN significantly outperforms existing methods in accuracy across various datasets and attack scenarios, while also being more resource-efficient (requiring fewer API calls).

Foundational Concepts:
- Large Language Models (LLMs): These are massive neural networks trained on vast amounts of text data (e.g., GPT-4, Llama 3). They can understand and generate human-like text for tasks like question answering, summarization, and conversation.
- LLM Agents: An LLM agent is an autonomous system that uses an LLM as its core reasoning engine to perceive its environment, make decisions, and take actions to achieve goals.
- Multi-agent Collaboration: A system where multiple LLM agents communicate and work together to solve a problem that might be too complex for a single agent. This often involves multi-turn dialogues or debates.
- Hallucination: A phenomenon where an LLM generates text that is factually incorrect, nonsensical, or not grounded in its input data, yet presents it confidently. Hallucination amplification occurs when one agent's hallucination is accepted and built upon by other agents, spreading the error.
- Error Injection and Propagation: This refers to the deliberate introduction of false information into the system. The paper categorizes this into agent-targeted attacks (corrupting an agent's internal state, e.g., via malicious prompts) and communication-targeted attacks (intercepting and altering messages between agents).
- Temporal Attributed Graph: A graph that evolves over time. In this context, nodes represent agents at specific timesteps, edges represent communication between them, and "attributes" are features associated with the nodes (e.g., text embeddings of the agents' responses).
- Graph Convolutional Network (GCN): A type of neural network designed to work directly with graph data. It learns node representations by aggregating information from a node's neighbors, effectively capturing both structural and feature information.
- Encoder-Decoder Architecture: A common neural network design, especially for reconstruction tasks. The encoder compresses the input data into a compact, low-dimensional representation (latent embedding). The decoder then tries to reconstruct the original input from this compressed representation. In anomaly detection, inputs that are poorly reconstructed are flagged as anomalous.
- Information Bottleneck (IB) Theory: A principle from information theory that aims to find the best trade-off between compressing data and preserving information relevant to a specific task. The goal is to "squeeze out" irrelevant details while keeping the essential ones.
Previous Works & Differentiation:
- Collaborative Error Detection: Methods like cross-examination [23] and external supportive feedback [13] focus on verifying the output of individual agents but fail to model how errors spread in a multi-agent network.
- Multi-agent Collaboration Defenses: Approaches like majority voting [24] or uncertainty estimation [12] are used to reach a consensus. However, they often oversimplify agent dependencies and assume agents are independent, which is not true in a collaborative setting. They can also require modifying the base LLMs.
- Graph-based Anomaly Detection: GCNs have been used for anomaly detection in other domains like finance and social networks. GUARDIAN is novel in applying this to the specific safety challenges of LLM multi-agent systems, particularly by using a temporal graph model.
- GUARDIAN's Differentiation: Unlike previous methods, GUARDIAN provides a unified framework that handles both accidental hallucinations and malicious attacks. It is model-agnostic, working with any LLM without modification. Its use of temporal graphs explicitly models propagation dynamics, and the Information Bottleneck mechanism makes the approach efficient and robust to noise.

4. Methodology (Core Technology & Implementation)

GUARDIAN's methodology transforms the abstract concept of agent collaboration into a concrete mathematical structure that can be analyzed for anomalies.

3.1 & 3.2 Temporal Attributed Graph Framework

The first step is to model the collaboration process.

Nodes ( $v_{t,i}$ ): Each node represents the $i$ -th agent ( $m_i$ ) at a specific discrete timestep ( $t$ ).
Node Attributes ( $\mathbf{x}_{t,i}$ ): The text response ( $r_{t,i}$ ) of an agent is converted into a numerical vector (embedding) using a pre-trained language model like BERT. This vector serves as the feature or attribute of the corresponding node.
Edges ( $(v_{t-1, i}, v_{t, j})$ ): A directed edge is drawn from an agent at timestep t-1 to an agent at timestep $t$ if the latter uses the former's output as input or context for its own response. This explicitly maps the flow of information.

As shown in Figure 1 and 2, this representation makes the propagation of errors visually and structurally traceable.

该图像是图1，展示了LLM多智能体协作中三大关键安全问题。左侧（1）为幻觉放大，其中关于“计算机科学”专业的幻觉信息在所有智能体间传播。中间（2）是智能体目标错误传播，恶意智能体注入虚假信息（如将2016改为2015），并持续影响后续轮次。右侧（3）为通信目标错误注入与传播，恶意智能体在智能体间传输过程中拦截并篡改信息，扰乱协作。

该图像是图2，展示了LLM多智能体协作中的安全问题及错误传播。图2(a)示例了A2A协议下，代理目标攻击和通信目标攻击如何影响多轮协作，早期的攻击会影响后续代理的响应。图2(b)可视化了代理和通信目标错误注入与传播，用3D曲面图表示异常程度，验证了时间属性图在捕获错误动态方面的有效性。

4.1 Problem Formulation

Given a sequence of these graphs over time, $\{\mathcal{G}_t\}_{t=1}^T$ , where each graph $\mathcal{G}_t = (\mathcal{V}_t, \mathcal{E}_t, \mathbf{X}_t)$ contains the nodes, edges, and node attributes at timestep $t$ , the goal is to find a function $f$ that assigns an anomaly score $s_v$ to each node. Nodes with scores above a threshold $\tau$ are considered anomalous ( $\mathcal{V}_t^*$ ) and are removed from the graph for subsequent rounds. The core challenge is to design the loss function $\mathcal{L}(f)$ that guides this detection process without labeled data.

4.2 Encoder-Decoder Architecture

GUARDIAN employs an unsupervised encoder-decoder architecture specifically designed for temporal graphs. The key idea is to train a model to perfectly reconstruct "normal" agent interactions. When a "non-normal" (anomalous) interaction occurs, the model will struggle to reconstruct it, resulting in a high reconstruction error, which serves as the anomaly score. The architecture, shown in Figure 3, has four main components.

![Figure 3: Framework overview of GUARDIAN, showing a case study at timestep $t _ { 2 }$ ( Graph Preprocessing: The collaboration information from $t _ { 0 }$ to $t _ { 2 }$ is tanm tt $\\mathbf { \\Delt…](/files/papers/68ef478de77486f6f3192ea5/images/3.jpg) *该图像是GUARDIAN框架概述，展示了在时刻$ t_2 $的案例研究。它包含五个主要步骤：图预处理、属性图编码器、时间信息编码器、结构和属性重建解码器，以及基于重建结果识别异常节点。异常节点（如$ m_1 $）被识别并从协作网络中移除，以增强安全性。* 1. **Attributed Graph Encoder:** * **Purpose:** To learn a compressed representation (embedding)$ \mathbf{Z}_t $for each node in the graph at timestep$ t $, capturing both its own features ($ \mathbf{X}_t $) and its local neighborhood structure ($ \mathcal{E}_t $). * **Method:** A two-layer Graph Convolutional Network (GCN) is used. The GCN updates each node's representation by aggregating information from its neighbors. The propagation rule for a layer$ l $is:$ \mathbf{H}^{(l+1)} = \mathrm{ReLU}(\mathbf{\tilde{D}}^{-\frac{1}{2}}\mathbf{\tilde{A}}\mathbf{\tilde{D}}^{-\frac{1}{2}}\mathbf{H}^{(l)}\mathbf{W}^{(l)}) $where$ \mathbf{H}^{(0)} = \mathbf{X}_t $,$ \mathbf{\tilde{A}} $is the adjacency matrix with self-loops,$ \mathbf{\tilde{D}} $is the degree matrix, and$ \mathbf{W}^{(l)} $is a trainable weight matrix. This process effectively encodes the graph's structural and attribute information into the node embeddings$ \mathbf{Z}_t = \mathbf{H}^{(2)} $. 2. **Time Information Encoder:** * **Purpose:** To capture the temporal dependencies in the collaboration. An agent's behavior at the current timestep$ T $depends on the entire history of interactions. * **Method:** A Transformer encoder is used. It takes the sequence of graph embeddings from all previous timesteps$ {\mathbf{Z}_1, \mathbf{Z}_2, \dots, \mathbf{Z}_T} $as input. Using its self-attention mechanism, it weighs the importance of historical information and aggregates it to produce a final, time-aware representation$ \mathbf{Z}_T $for the current timestep. The self-attention formula is:$ Attn(\mathbf{Q}, \mathbf{K}, \mathbf{V}) = \mathrm{softmax}(\frac{\mathbf{Q}\mathbf{K}^\top}{\sqrt{d_k}})\mathbf{V} $where$ \mathbf{Q} $,$ \mathbf{K} $, and$ \mathbf{V} $are query, key, and value matrices derived from the input sequence of embeddings. 3. **Attribute Reconstruction Decoder:** * **Purpose:** To reconstruct the original node attributes (agent response embeddings)$ \mathbf{X}_T $from the final latent representation$ \mathbf{Z}_T $. * **Method:** A dedicated decoder (e.g., a multi-layer perceptron) maps$ \mathbf{Z}_T $back to the reconstructed attributes$ \hat{\mathbf{X}}T $. The reconstruction loss for attributes is measured by the mean squared error:$ \mathcal{L}{\mathrm{att}} $for a specific node indicates its response content is anomalous. 4. **Structure Reconstruction Decoder:** * **Purpose:** To reconstruct the graph's structure (the adjacency matrix$ \mathcal{E}_T $) from the final latent representation$ \mathbf{Z}T $. This helps detect anomalous communication patterns. * **Method:** The decoder predicts the probability of an edge existing between any two nodes based on their embeddings in$ \mathbf{Z}T $. The structural reconstruction loss is measured by binary cross-entropy:$ e{ij} $is 1 if an edge exists and 0 otherwise, and$ p{ij} $is the predicted probability. A high loss indicates the communication patterns are abnormal. ## 4.3 Graph Abstraction by Information Bottleneck To handle the complexity and redundancy of dense multi-agent interactions, GUARDIAN uses the **Graph Information Bottleneck (GIB)** principle. * **Goal:** Compress the graph representation$ \mathbf{X}_t $into a more compact latent representation$ \mathbf{Z}_t $that retains only the information necessary to predict a target variable$ \mathbf{Y}_t $(e.g., the final collaboration outcome). * **Loss Function:** This is achieved by minimizing the GIB loss:$ I(\mathbf{X}_t; \mathbf{Z}_t) $: Mutual information between the original features and the compressed representation. Minimizing this term forces compression. *$ I(\mathbf{Z}_t; \mathbf{Y}t) $: Mutual information between the compressed representation and the target. Maximizing this term ensures that task-relevant information is preserved. *$ \beta $\mathcal{L}_{\mathrm{total}} = \mathcal{L}_{\mathrm{rec}} + \lambda \mathcal{L}_{\mathrm{GIB}}$ \mathcal{L}{\mathrm{rec}} $is a weighted sum of the attribute and structure losses:$ \alpha $: Balances the importance of correctly reconstructing node attributes versus graph structure. *$ \lambda\alpha $and$ \gamma $. The model performs best when$ \alpha $(balancing attribute vs. structure loss) is between 0.3 and 0.5, indicating that both content and communication patterns are important for detection. The optimal range for$ \gamma $(related to the GIB's compression rate) is between 0.001 and 0.01, suggesting that a moderate level of compression is needed to filter out noise without losing critical information. ![Figure 6: Parameter analysis of$ \alpha $and$ \gamma $for GUARDIAN accuracy.](/files/papers/68ef478de77486f6f3192ea5/images/6.jpg) *该图像是图6，展示了GUARDIAN模型在MMLU、MATH、Biographies和FEVER四个数据集上，参数分析对准确率的影响。左侧曲线图描绘了准确率随参数$ \alpha $值的变化，右侧曲线图则展示了准确率随参数$ \gamma $值的变化。* # 7. Conclusion & Reflections * **Conclusion Summary:** The paper successfully presents `GUARDIAN`, a novel and effective framework for enhancing the safety of LLM multi-agent collaborations. By modeling interactions as temporal attributed graphs, it provides a principled way to detect and mitigate the propagation of hallucinations and malicious errors. Its unsupervised, model-agnostic design makes it broadly applicable, and its strong empirical performance combined with resource efficiency marks a significant step forward in building more reliable and trustworthy multi-agent AI systems. * **Limitations & Future Work:** The authors do not explicitly state limitations. However, potential limitations could include: * **Computational Overhead:** While API calls are reduced, the process of constructing graphs, generating embeddings, and running the GCN-based model introduces its own computational cost that might be significant for very large-scale or real-time systems. * **Dependency on Embeddings:** The quality of anomaly detection relies heavily on the quality of the text embeddings from BERT. Subtle or cleverly disguised misinformation might not be reflected as a significant deviation in the embedding space. * **Threshold Sensitivity:** The performance depends on the choice of the anomaly threshold$ \tau$, which may need to be tuned for different tasks or domains.

Personal Insights & Critique:
- Novelty: The application of temporal graph networks and the Information Bottleneck principle to LLM agent safety is a highly innovative and powerful combination. It moves the field beyond simple heuristics like voting and provides a more dynamic and holistic view of the problem.
- Practical Impact: GUARDIAN is highly practical. Its model-agnostic nature means it can be implemented as a safety layer on top of existing multi-agent frameworks using both open-source and proprietary LLMs. The reduction in API calls is a major practical benefit, making safer systems also more affordable to run.
- Open Questions: The framework currently removes anomalous nodes entirely. A more nuanced approach could be to "quarantine" or "down-weight" their contributions instead of complete removal. Furthermore, the framework's ability to handle more sophisticated, long-term adversarial strategies (e.g., an agent behaving normally for many rounds before injecting a critical error) could be an interesting area for future research. The visualizations provided (e.g., Figure 4, 10, 11, 12) are excellent for illustrating the method's effectiveness in concrete scenarios.
  
  $Figure 3: Framework overview of GUARDIAN, showing a case study at timestep $t _ { 2 }$ ( Graph Preprocessing: The collaboration information from $t _ { 0 }$ to $t _ { 2 }$ is tanm tt $\\mathbf { \\Delt…$ 该图像是图4，展示了GUARDIAN在多轮代理协作中检测并缓解幻觉和智能体特定错误的真实案例。第一轮中，四个代理提供答案，GUARDIAN识别并删除具有幻觉的代理1。第二轮中，代理3的计算仍存在错误，被GUARDIAN识别并删除。经过两轮检测，剩余的代理2和代理4达成共识，最终给出正确答案8。
  
  该图像是图10，展示了一个幻觉放大案例及其GUARDIAN应对方案。图中通过一个计算立方体中心坐标的几何问题，模拟了多智能体协作过程。在第一轮中，多个智能体给出解答，随后GUARDIAN系统进行异常检测，识别并删除了一个异常智能体（智能体3）。在第二轮中，经过GUARDIAN的干预和对异常智能体的删除，最终智能体协作达到了一个共识性正确答案，成功缓解了幻觉放大问题。
  
  $Figure 6: Parameter analysis of $\\alpha$ and $\\gamma$ for GUARDIAN accuracy.$ 该图像是图11，展示了GUARDIAN系统如何在一个真实案例中检测并缓解大型语言模型（LLM）多智能体协作中的错误注入和传播。图中可见，智能体分多轮协作解决立方体中心坐标计算问题。GUARDIAN通过异常检测机制，在每轮中识别并“删除”有问题的智能体提案，有效阻止错误扩散，最终使智能体达成共识性答案（3,2,4），保障了协作的安全性。
  
  该图像是图12，展示了GUARDIAN如何解决通信中错误注入和传播的真实案例。它通过多轮代理协作解决几何问题，并在异常检测阶段，GUARDIAN系统识别并删除了错误信息，最终促成代理达成共识。