KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
TL;DR Summary
KABB introduces a novel framework to enhance multi-agent system coordination, addressing expensive LLM scaling and static knowledge issues. It employs a 3D knowledge distance model, dual-adaptation, and knowledge-aware Thompson Sampling for expert selection. Findings show KABB ac
Abstract
As scaling large language models faces prohibitive costs, multi-agent systems emerge as a promising alternative, though challenged by static knowledge assumptions and coordination inefficiencies. We introduces Knowledge-Aware Bayesian Bandits (KABB), a novel framework that enhances multi-agent system coordination through semantic understanding and dynamic adaptation. The framework features three key innovations: a three-dimensional knowledge distance model for deep semantic understanding, a dual-adaptation mechanism for continuous expert optimization, and a knowledge-aware Thompson Sampling strategy for efficient expert selection. Extensive evaluation demonstrates KABB achieves an optimal cost-performance balance, maintaining high performance while keeping computational demands relatively low in multi-agent coordination.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
- Title: KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
- Authors: Jusheng Zhang, Zimeng Huang, Yijia Fan, Ningyuan Liu, Mingyan Li, Zhuojie Yang, Jiawei Yao, Jian Wang, Keze Wang.
- Affiliations: Sun Yat-sen University, University of Washington, Snap Inc.
- Journal/Conference: The paper is presented in a format typical for major AI/ML conferences (e.g., NeurIPS, ICML, ICLR). The specific venue is not mentioned in the provided text.
- Publication Year: The citations suggest the work is contemporary, likely from 2024 or 2025.
- Abstract: The paper introduces the Knowledge-Aware Bayesian Bandits (KABB) framework to improve coordination in multi-agent systems (MAS). As an alternative to scaling single large language models (LLMs), which is prohibitively expensive, KABB addresses the challenges of static knowledge assumptions and inefficiency in MAS. Its core innovations are: (1) a three-dimensional knowledge distance model for semantic understanding, (2) a dual-adaptation mechanism for continuous expert optimization, and (3) a knowledge-aware Thompson Sampling strategy for efficient expert selection. Evaluations show that KABB achieves a strong balance between performance and computational cost.
- Original Source Link: /files/papers/68e0b3ca9cc40dff7dd2bb46/paper.pdf (Publication status appears to be a preprint or a paper under review, based on the promise to release code "in accordance with the review policy.")
2. Executive Summary
-
Background & Motivation (Why):
- Core Problem: Continuously scaling single Large Language Models (LLMs) to improve performance is becoming economically and computationally unsustainable. Multi-Agent Systems (MAS), which combine several smaller, specialized models, offer a promising alternative. However, existing MAS frameworks like Mixture-of-Agents (
MoA) suffer from high redundancy (all agents respond to every query) and inefficiencies, while Mixture-of-Experts (MoE) frameworks often rely on static, predefined expert roles. Both approaches struggle with dynamic adaptation and deep semantic understanding of tasks. - Importance & Gaps: There is a need for a system that can dynamically and intelligently select the right experts for a given task, minimizing cost while maximizing performance. Previous methods often lack a sophisticated understanding of the semantic relationship between a task's requirements and an expert's capabilities, and they fail to adapt as experts' performance evolves.
- Innovation: KABB introduces a novel solution by integrating a Multi-Armed Bandit (MAB) decision-making framework with a deep, semantic knowledge representation. This allows the system to intelligently explore and exploit different combinations of experts based not just on past success, but on a nuanced understanding of their knowledge domains.
- Core Problem: Continuously scaling single Large Language Models (LLMs) to improve performance is becoming economically and computationally unsustainable. Multi-Agent Systems (MAS), which combine several smaller, specialized models, offer a promising alternative. However, existing MAS frameworks like Mixture-of-Agents (
-
Main Contributions / Findings (What):
- Three-Dimensional Knowledge Distance Model: A sophisticated metric that goes beyond simple keyword matching to measure the "distance" between a task and an expert team. It considers conceptual overlap, knowledge dependencies, and historical performance.
- Dual-Adaptation Mechanism: A system that allows for continuous learning and adaptation. It uses Bayesian updates with time decay to keep expert performance models current and dynamically adjusts the underlying knowledge graph to reflect evolving capabilities.
- Knowledge-Aware Thompson Sampling: An efficient expert selection strategy. It modifies the classic Thompson Sampling algorithm by incorporating the knowledge distance metric, allowing it to more quickly identify the most promising experts for a task, thereby balancing exploration and exploitation effectively.
- Key Conclusion: The KABB framework successfully achieves a superior cost-performance balance. It maintains high performance, comparable or superior to state-of-the-art models and ensembles, while significantly reducing computational costs by intelligently routing queries to a small, relevant subset of experts.
3. Prerequisite Knowledge & Related Work
-
Foundational Concepts:
- Large Language Models (LLMs): AI models (like GPT-4, LLaMa-3) trained on vast amounts of text data to understand and generate human-like language.
- Multi-Agent Systems (MAS): Systems composed of multiple autonomous "agents" (in this case, LLMs) that interact with each other to solve a problem that may be beyond the capability of any single agent.
- Mixture of Agents (MoA): A specific MAS framework where multiple LLMs act as "proposers" to generate responses, which are then refined and synthesized by a central "aggregator." A key drawback is that all proposers are activated for every task, leading to high costs.
- Multi-Armed Bandit (MAB): A classic reinforcement learning problem that models a gambler choosing between multiple slot machines ("bandits") with unknown reward probabilities. The goal is to maximize total reward by balancing exploitation (choosing the machine with the best-known payout) and exploration (trying other machines to find a potentially better one).
- Thompson Sampling: A popular algorithm for solving the MAB problem. It uses a Bayesian approach, modeling the reward probability of each arm with a probability distribution (e.g., a Beta distribution) and selecting an arm by sampling from these distributions.
- Knowledge Graph: A structured way of representing knowledge as a network of entities (nodes) and their relationships (edges). It allows for reasoning about connections between different concepts.
-
Previous Works & Differentiation:
- LLM Ensemble Methods: Early methods focused on simple aggregation like re-ranking or averaging outputs (
PAIRRANKER,GENFUSER). More recent frameworks likeMoAintroduced iterative refinement but incurred linear cost scaling. Routing-based methods (ZOOTER) aimed to select the best single model for a task. - MAB for Decision Optimization: Classical MAB algorithms (
UCB,Thompson Sampling) have been used for dynamic decision-making but traditionally lack semantic understanding; they rely purely on historical reward signals. - KABB's Differentiation: KABB stands out by fusing these two fields. Unlike
MoA, it doesn't query all agents. Instead, it uses a knowledge-aware MAB to select a subset of agents. Unlike traditional MABs, its decisions are not just based on historical win/loss records but are guided by a deep semantic knowledge distance metric, making the exploration process much more efficient and targeted.
- LLM Ensemble Methods: Early methods focused on simple aggregation like re-ranking or averaging outputs (
4. Methodology (Core Technology & Implementation)
The core of KABB is a closed-loop system that intelligently selects experts for a task and learns from the outcome.
-
Principles: The central idea is to frame the expert selection problem as a Multi-Armed Bandit problem where each "arm" is a potential subset of experts. The reward signal is guided by a novel, semantically rich "knowledge distance" metric, ensuring that the system prioritizes experts who are not only historically successful but also conceptually aligned with the task.
-
Steps & Procedures (System Architecture): The process, as illustrated in Image 2, follows four main steps:

- Task Reception and Concept Extraction: A user's task is received and parsed to extract key concepts, represented as a vector .
- Expert Capability Mapping: Each available LLM is considered an "expert" with its own capability vector , representing its proficiency across different concepts.
- Expert Subset Selection: This is the core step. The system uses the knowledge-aware Thompson Sampling strategy to select the optimal subset of experts . This strategy considers both historical performance and the knowledge distance between the task and the expert team. The selected experts then process the task in parallel.
- Performance Feedback and Model Update: The outputs from the selected experts are synthesized by an aggregator. The final performance (e.g., success rate) is used as feedback to update the Bayesian parameters of the MAB model, refining the system for future decisions.
-
Mathematical Formulas & Key Details:
1. Knowledge Distance Function (
Definition 3.1) This function measures the mismatch between an expert subset S and a task t. A smaller distance implies a better fit.-
: Task difficulty.
-
: Jaccard similarity measuring the conceptual overlap between the team's knowledge and the task's requirements.
-
: Number of dependency edges in the knowledge graph, representing how complex the reasoning path is.
-
: The team's average historical success rate on similar tasks.
-
Synergy(S): A measure of how well the experts in the subset S complement each other, with higher values indicating better collaboration. -
: Learnable weights that balance these five dimensions.
The paper also proves this function satisfies pseudo-metric properties (
Theorem 3.2), ensuring it behaves as a reasonable distance measure.
2. Dynamic Bayesian MAB Algorithm KABB models the success probability of each expert team S with a time-varying Beta distribution, , where represents successes and represents failures.
-
Parameter Update Equations: The parameters are updated with a dual-adaptation mechanism that considers historical decay, immediate feedback, and knowledge matching.
- : An exponential time decay factor that reduces the influence of old data.
- : The binary reward (1 for success, 0 for failure) from the current task.
- : A knowledge matching index that provides a "bonus" reward if the team is semantically well-aligned with the task.
-
Knowledge-Aware Sampling Strategy: To select an expert team, KABB doesn't just sample from the Beta distribution. It calculates a comprehensive confidence score that incorporates knowledge distance and synergy.
- This formula beautifully combines four factors: the team's expected historical performance (), a penalty for large knowledge distance (), a time decay factor, and a bonus for strong internal synergy. The team with the highest score is chosen.
-
5. Experimental Setup
-
Datasets:
- Primary:
AlpacaEval 2.0, a benchmark with 805 real-world instructions, evaluated against GPT-4 Preview. - Secondary:
MT-Benchfor multi-turn dialogue, andFLASK-Hardfor 12 specific skill categories. - Reasoning:
Arena-Hard,MATH, andBBHwere used for evaluating reasoning and problem-solving abilities.
- Primary:
-
Evaluation Metrics:
LC win rate(Length-Controlled Win Rate): The primary metric fromAlpacaEval 2.0. It measures the percentage of times a model's output is preferred over a reference model (GPT-4 Preview) by a GPT-4 based evaluator, with controls to prevent bias towards longer answers.RAS(Routing Alignment Score): A custom metric to measure how well the system's expert selections align with human expert annotations.PWRS(Preference-Weighted Routing Score): A custom metric that weights the routing alignment by the quality of the selected expert's final output.
-
Baselines: The paper compares KABB against a wide range of models including:
- Proprietary Models: Various versions of
GPT-4. - Open-Source Models:
LLaMa-3-70B,Qwen2-72B,Deepseek-R1, etc. - Ensemble Methods:
MoA(Mixture of Agents), configured with the same underlying LLMs for a fair comparison. - Ablations: Two variants of KABB are tested:
KABB w/o Deepseek(to measure the impact of the strongest models) andKABB-Single-LLaMa3(to show the benefits of collaboration even with a single base model).
- Proprietary Models: Various versions of
6. Results & Analysis
-
Core Results:
-
AlpacaEval 2.0: KABB achieved a leading
LC win rateof 77.9%, outperformingMoA(68.1%) by a significant margin. Crucially, KABB achieved this by selecting only 2 experts on average, whereasMoArequired 6, demonstrating massive cost savings. -
MT-Bench: KABB obtained a state-of-the-art score of 9.60, showing its robustness in multi-turn conversational tasks.
-
FLASK-Hard: The radar chart in Image 3 shows KABB's strong and balanced performance across 12 skill sets. It matches or surpasses
MoAandGPT-4in most categories, especially inmetacognition,correctness, androbustness. It lags slightly inconciseness, suggesting it tends to produce more detailed responses.
-
-
Ablations / Parameter Sensitivity:
-
What Makes KABB Effective?: Table 2 compares the core
Knowledge-Aware (KA)routing with MAB against other methods.Method LC win. RAS PWRS KA (MAB) (Ours) 62.4 94.16 60.19 CL (MAB) 60.9 92.92 57.34 KA (A2C) 60.2 91.61 54.38 KA (PPO) 57.3 90.43 56.07 KA (MCTS) 54.8 87.95 51.74 The results clearly show that the
KAmechanism outperforms the simplerClassifier-Based (CL)approach, and theMABalgorithm is superior to other reinforcement learning optimizers (A2C,PPO,MCTS) in this context. This validates the two central design choices of the KABB framework. -
Budget and Consumption Analysis:
-
Cost-Effectiveness (Image 4): This scatter plot is a key result. It plots performance (
LC Win Rate) against cost. KABB configurations consistently lie on or near the Pareto frontier, representing the optimal trade-off. KABB achieves higher win rates thanGPT-4oat a lower cost and provides a far better cost-performance balance thanMoA.
-
Computational Demand (Image 5): This plot shows performance versus
TFLOPS(a proxy for computation/latency). KABB demonstrates high performance with relatively low computational demand, highlighting its efficiency and scalability compared to the brute-force approach ofMoA.
-
-
Effect of Number of Concepts/Experts (Image 6): This line chart explores how performance changes when tuning the number of selected concepts and experts. The results show that performance generally improves with more experts, but there's a sweet spot. A configuration of 2 concepts and 3 experts achieves the highest win rate of 81%, demonstrating that strategic selection is more important than simply adding more experts.

-
Mathematical Problem Solving Example: The paper includes a compelling example of solving for the largest integer not exceeding: In this case,
MoA's aggregator is misled by incorrect proposals and outputs the wrong answer (81). In contrast, KABB's knowledge-aware approach correctly identifies the mathematical reasoning expert, leading to the correct answer (80), vividly illustrating the advantage of specialized routing over simple aggregation.
-
7. Conclusion & Reflections
-
Conclusion Summary: The paper successfully introduces KABB, a framework that advances multi-agent coordination by combining deep semantic understanding with a dynamic Bayesian bandit approach. Through its three core innovations—the knowledge distance model, dual adaptation, and knowledge-aware Thompson Sampling—KABB achieves a superior balance of high performance and low computational cost. The extensive experiments validate its effectiveness against strong baselines, marking a significant step towards more efficient, adaptive, and intelligent multi-agent systems.
-
Limitations & Future Work: The authors acknowledge that KABB sometimes produces overly detailed outputs, lagging in conciseness. Future work could focus on optimizing this trade-off between thoroughness and brevity without sacrificing quality.
-
Personal Insights & Critique:
- Novelty and Impact: The fusion of knowledge graphs and multi-armed bandits for LLM coordination is a powerful and elegant idea. It addresses a critical, practical problem in the field: how to get the benefits of multiple models without incurring astronomical costs. This "smart routing" paradigm is likely to become increasingly important as the number of specialized open-source models grows.
- Interpretability: A significant advantage of KABB is its transparency. The knowledge distance metric provides a clear rationale for why certain experts were chosen. The graph-guided integration process can trace reasoning paths. This inherent interpretability is a crucial feature for building trustworthy and responsible AI systems.
- Practical Challenges: While powerful, the framework's effectiveness depends on the quality of the initial knowledge graph and concept definitions. Building and maintaining this graph for very broad or rapidly evolving domains could be a significant engineering challenge. The complexity of the system, with its many moving parts and hyperparameters (e.g., , , ), might also make it difficult to tune in practice.
- Overall: KABB presents a compelling and well-executed vision for the future of collaborative AI. It shifts the focus from building monolithic, ever-larger models to creating intelligent, efficient systems that can dynamically compose teams of specialized experts, a much more scalable and sustainable approach.
Similar papers
Recommended via semantic vector search.