Paper status: completed

Large Language Model Agent: A Survey on Methodology, Applications and Challenges

Published:03/27/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
9 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This survey systematically analyzes large language model agents’ architecture, collaboration, and evolution, unifying fragmented research and outlining evaluation, tools, and applications, to guide future advancements toward artificial general intelligence.

Abstract

The era of intelligent agents is upon us, driven by revolutionary advancements in large language models. Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence. This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy, linking architectural foundations, collaboration mechanisms, and evolutionary pathways. We unify fragmented research threads by revealing fundamental connections between agent design principles and their emergent behaviors in complex environments. Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time, while also addressing evaluation methodologies, tool applications, practical challenges, and diverse application domains. By surveying the latest developments in this rapidly evolving field, we offer researchers a structured taxonomy for understanding LLM agents and identify promising directions for future research. The collection is available at https://github.com/luo-junyu/Awesome-Agent-Papers.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

  • Title: Large Language Model Agent: A Survey on Methodology, Applications and Challenges
  • Authors: Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, Rongcheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, Dg Tao, and Philip S. Yu. The large number of authors from various institutions suggests a comprehensive, collaborative effort to survey this broad field.
  • Journal/Conference: This paper is a preprint available on arXiv. Preprints are research articles shared publicly before or during the formal peer-review process.
  • Publication Year: The paper lists a publication date of March 27, 2025, on arXiv, indicating it is a very recent submission intended for future publication.
  • Abstract: The abstract introduces Large Language Model (LLM) agents as a potential pathway toward Artificial General Intelligence (AGI), highlighting their goal-driven and adaptive capabilities. The authors propose a systematic survey that deconstructs LLM agents using a methodology-centered taxonomy. This taxonomy connects three core dimensions: how agents are constructed, how they collaborate, and how they evolve. The survey aims to unify existing research, provide a structured architectural perspective, and cover practical aspects like evaluation, tools, challenges (e.g., security, ethics), and applications. The goal is to offer researchers a clear framework for understanding the field and to identify future research directions.
  • Original Source Link:

2. Executive Summary

  • Background & Motivation (Why): The field of artificial intelligence is witnessing a paradigm shift towards "intelligent agents" powered by Large Language Models (LLMs). Unlike traditional AI, these LLM agents can perceive environments, reason about goals, and execute actions dynamically. However, research in this area is fragmented and evolving at an explosive pace. There is a critical need for a systematic framework to organize and understand the principles behind how these agents are designed, how they work together, and how they improve. This paper addresses this gap by providing a comprehensive survey that unifies disparate research threads under a single, coherent taxonomy.

  • Main Contributions / Findings (What):

    1. Methodology-Centered Taxonomy: The paper introduces a novel taxonomy for understanding LLM agents, focusing on the core methodologies behind their creation and operation. This provides a structured way to analyze and compare different agent systems.
    2. Build-Collaborate-Evolve Framework: The central contribution is a holistic framework that examines LLM agents through three interconnected stages of their lifecycle:
      • Construction (Build): How an individual agent is built, including its profile, memory, planning, and action capabilities.
      • Collaboration (Collaborate): How multiple agents interact, using centralized, decentralized, or hybrid architectures.
      • Evolution (Evolve): How agents learn and improve over time, through self-optimization, multi-agent co-evolution, or external feedback.
    3. Comprehensive Ecosystem Review: Beyond methodology, the survey provides a broad overview of the LLM agent ecosystem, including evaluation benchmarks, development tools, real-world challenges (security, privacy, ethics), and diverse applications. This offers a 360-degree view of the field, from theory to practice.

3. Prerequisite Knowledge & Related Work

  • Foundational Concepts:

    • Large Language Model (LLM): A type of AI model (like GPT-4 or Llama) trained on vast amounts of text data. LLMs excel at understanding and generating human-like language, making them powerful "brains" for reasoning and decision-making.
    • Intelligent Agent: An autonomous entity that can perceive its environment, make decisions, and take actions to achieve specific goals.
    • LLM Agent: An intelligent agent that uses an LLM as its core processing unit or "cognitive engine." It leverages the LLM's reasoning, planning, and language capabilities to interact with digital or physical environments.
    • Artificial General Intelligence (AGI): A hypothetical form of AI that possesses the ability to understand, learn, and apply its intelligence to solve any intellectual task that a human being can. LLM agents are considered a potential stepping stone toward AGI.
  • Previous Works: The authors explicitly position their survey as a more comprehensive and methodologically focused work compared to prior surveys. They note that previous efforts have been narrower in scope, focusing on:

    • Specific applications like gaming ([11], [12]).
    • Particular deployment environments ([13], [14]).
    • Specific capabilities like multi-modality ([15]) or security ([16]).
    • Other surveys provided broad overviews without a deep methodological breakdown ([1], [17]) or focused on just one aspect, like multi-agent interaction ([18]) or workflows ([19]).
  • Technological Evolution: The paper attributes the rise of modern LLM agents to the convergence of three key technological advancements:

    1. Unprecedented Reasoning Capabilities: LLMs have become powerful general-purpose reasoners.
    2. Tool Manipulation: Agents can now interact with external software, APIs, and environments.
    3. Sophisticated Memory: New architectures allow agents to accumulate experiences and learn over time.
  • Differentiation: This survey distinguishes itself through its unique Build-Collaborate-Evolve framework. Unlike previous work that often treated individual agent design and multi-agent systems separately, this paper presents an integrated perspective, showing the continuity from how a single agent is built to how groups of agents collaborate and evolve. Its methodology-centered taxonomy provides a deeper, more structured analysis of the architectural foundations of LLM agents.

4. Methodology (Core Technology & Implementation)

The core contribution of this paper is its detailed taxonomy of LLM agent methodologies, organized into the Build-Collaborate-Evolve framework.

Fig. 1: An overview of the LLM agent ecosystem organized into four interconnected dimensions: \(\\bullet\) Agent Methodology, covering the foundational aspects of construction, collaboration, and evolut… Image 1: This diagram illustrates the paper's overall structure. The central focus is "Agent Methodology," broken down into Construction, Collaboration, and Evolution. This is surrounded by practical considerations like Evaluation, Tools, Real-World Issues, and Applications.

Fig. 2: A taxonomy of large language model agent methodologies. Image 2: This figure provides a detailed map of the paper's core taxonomy. It shows how the three main pillars—Construction, Collaboration, and Evolution—are further deconstructed into their fundamental components.

4.1 Agent Construction (How an agent is built)

This phase covers the foundational components of a single agent.

  • 1. Profile Definition: This sets the agent's identity and behavioral rules.

    • Human-Curated Static Profiles: Experts manually define an agent's role, rules, and knowledge. This ensures predictable, consistent behavior, ideal for structured tasks. Examples include Camel and AutoGen, where agents have predefined roles like "assistant" and "user proxy," or MetaGPT and ChatDev, where agents play specific roles in a software development team (e.g., "programmer," "product manager").
    • Batch-Generated Dynamic Profiles: Agent profiles are generated automatically with controlled variations in personality, knowledge, or values. This is useful for simulating diverse social behaviors or creating varied user profiles for testing, as seen in human behavior simulation studies. DSPy can be used to optimize these generation parameters.
  • 2. Memory Mechanism: This enables agents to store and retrieve information.

    • Short-Term Memory: Holds recent conversational history and environmental feedback for immediate context. It is essential for step-by-step reasoning but is limited by the LLM's context window size and is transient. Used in frameworks like ReAct and ChatDev.
    • Long-Term Memory: Stores experiences and knowledge for future use. This can be implemented as:
      • Skill Libraries: Codified procedures that an agent learns, like the Minecraft skills in Voyager.
      • Experience Repositories: Databases of past successes and failures, as in Reflexion.
      • Tool Synthesis: Frameworks where agents create new tools from existing ones, like in TPTU and OpenAgents.
    • Knowledge Retrieval as Memory: Instead of relying only on internal memory, agents use external knowledge sources. This is commonly done with Retrieval-Augmented Generation (RAG), where agents pull information from text corpora or knowledge graphs (GraphRAG) to answer questions or perform tasks.
  • 3. Planning Capability: This allows agents to break down complex goals into manageable steps.

    • Task Decomposition Strategies:
      • Single-Path Chaining: A linear plan of subtasks is created and executed sequentially. The simplest form is Chain-of-Thought (CoT) prompting. Dynamic planning improves this by generating the next step based on the current situation.
      • Multi-Path Tree Expansion: A tree-like structure of possible reasoning paths is explored. This allows the agent to backtrack and correct mistakes. The Tree-of-Thought (ToT) method is a prime example. This can be combined with search algorithms like Monte Carlo Tree Search for complex tasks in robotics or game playing.
    • Feedback-Driven Iteration: The agent refines its plan based on feedback from various sources: the environment (e.g., a robot bumping into a wall), humans (corrections), the model itself (self-verification), or other agents (collaboration).
  • 4. Action Execution: This is the agent's ability to interact with its environment.

    • Tool Utilization: Using external tools to overcome LLM limitations. This involves deciding when to use a tool and which tool to select. Examples include using a calculator for math, a search engine for real-time information, or a code interpreter for programming.
    • Physical Interaction: For embodied agents (e.g., robots), this involves executing actions in the real world, which requires understanding hardware constraints, social norms, and other agents.

4.2 Agent Collaboration (How agents work together)

This section explores how multiple agents can team up to solve problems that are too complex for a single agent.

  • 1. Centralized Control: A "manager" or "controller" agent coordinates the work of other agents.

    • Explicit Controller Systems: A dedicated agent acts as the central coordinator. For example, in Coscientist, a human controller directs specialized agents in a scientific workflow. In MetaGPT, a manager agent assigns tasks to programmer, tester, etc. agents.
    • Differentiation-based Systems: A single powerful LLM assumes different sub-roles as needed, acting as its own controller. Meta-Prompting is an example where a single model coordinates sub-tasks by assigning them to "virtual" specialized agents.
  • 2. Decentralized Collaboration: Agents interact directly with each other without a central manager.

    • Revision-based Systems: Agents work on a shared output, iteratively refining what others have done. In MedAgents, different medical expert agents propose and modify a diagnosis until a consensus is reached.
    • Communication-based Systems: Agents engage in dialogue or debate to solve a problem. AutoGen implements a group chat where agents can discuss and refine solutions. MAD (Multi-Agent Debate) uses structured debate to improve reasoning and avoid fixation on initial bad ideas.
  • 3. Hybrid Architecture: These systems combine centralized and decentralized approaches to get the best of both worlds.

    • Static Systems: The collaboration structure is predefined. CAMEL uses decentralized role-playing within teams but centralized coordination between teams. AFlow uses a three-tier hierarchy with different collaboration styles at each level.

    • Dynamic Systems: The collaboration structure adapts in real-time based on the task. DyLAN identifies the most important agents for a task and adjusts the communication topology to focus on them. MDAgents routes simple tasks to a single agent and complex tasks to a hierarchical team.

      Here is a transcription of Table 1 from the paper, summarizing these collaboration methods.

Table 1: A summary of agent collaboration methods.

Category Method Key Contribution
Centralized Control Coscientist [73] Human-centralized experimental control
LLM-Blender [74] Cross-attention response fusion
MetaGPT [27] Role-specialized workflow management
AutoAct [75] Triple-agent task differentiation
Meta-Prompting [76] Meta-prompt task decomposition
WJudge [77] Weak-discriminator validation
Decentralized Collaboration MedAgents [78] Expert voting consensus
ReConcile [79] Multi-agent answer refinement
METAL [115] Domain-specific revision agents
DS-Agent [116] Database-driven revision
MAD [80] Structured anti-degeneration protocols
MADR [81] Verifiable fact-checking critiques
MDebate [82] Stubborn-collaborative consensus
AutoGen [26] Group-chat iterative debates
Hybrid Architecture CAMEL [25] Grouped role-play coordination
AFlow [29] Three-tier hybrid planning
EoT [117] Multi-topology collaboration patterns
DiscoGraph [118] Pose-aware distillation
DyLAN [119] Importance-aware topology
MDAgents [120] Complexity-aware routing

4.3 Agent Evolution (How agents improve)

This dimension explores mechanisms that enable agents to learn and adapt over time.

  • 1. Autonomous Optimization and Self-Learning: Agents improve without direct human supervision.

    • Self-Supervised Learning: Agents learn from unlabeled data they generate. Evolutionary optimization techniques allow models to merge and adapt efficiently.
    • Self-Reflection and Self-Correction: Agents critique and refine their own work. SELF-REFINE uses an iterative feedback loop. STaR (Self-Taught Reasoner) trains models to generate rationales and then filter the correct ones to fine-tune themselves.
    • Self-Rewarding and Reinforcement Learning (RL): Agents generate their own reward signals to guide learning. A model can act as its own "judge" (LLM-as-a-Judge) to score its outputs, providing a reward signal for improvement via RL.
  • 2. Multi-Agent Co-Evolution: Agents improve by interacting with other agents.

    • Cooperative and Collaborative Learning: Agents learn by working together. In ProAgent, agents infer their teammates' intentions to coordinate better. In CAMEL, role-playing agents collaborate to solve tasks.
    • Competitive and Adversarial Co-Evolution: Agents improve by competing or debating. Red-Teaming involves one agent trying to find flaws ("red team") in another agent to make it more robust. Multi-Agent Debate frameworks force agents to defend their reasoning and critique others, improving factuality.
  • 3. Evolution via External Resources: Agents improve by leveraging outside information and feedback.

    • Knowledge-Enhanced Evolution: Agents integrate structured knowledge (e.g., from a knowledge base) to improve planning and decision-making, as seen in KnowAgent.

    • External Feedback-Driven Evolution: Agents use feedback from tools or environments. CRITIC allows an agent to use an external tool to check its output and correct it. SelfEvolve uses feedback from a code executor to debug and improve its generated code.

      Below is a transcription of Table 2 from the paper, summarizing agent evolution methods.

Table 2: A summary of agent evolution methods.

Category Method Key Contribution
Self-Supervised Learning SE [86] Adaptive token masking for pretraining
Evolutionary Optimization [87] Efficient model merging and adaptation
DiverseEvol [88] Improved instruction tuning via diverse data
Self-Reflection & Self-Correction SELF-REFINE [89] Iterative self-feedback for refinement
STaR [90] Bootstrapping reasoning with few rationales
V-STaR [91] Training a verifier using DPO
Self-Verification [92] Backward verification for correction
Self-Rewarding & RL Self-Rewarding [93] LLM-as-a-Judge for self-rewarding
RLCD [94] Contrastive distillation for alignment
RLC [95] Evaluation-generation gap for optimization
Cooperative Co-Evolution ProAgent [96] Intent inference for teamwork
CORY [97] Multi-agent RL fine-tuning
CAMEL [25] Role-playing framework for cooperation
Competitive Co-Evolution Red-Team LLMs [98] Adversarial robustness training
Multi-Agent Debate [82] Iterative critique for refinement
MMAD [99] Debate-driven divergent thinking
Knowledge-Enhanced Evolution KnowAgent [83] Action knowledge for planning
WKM [84] Synthesizing prior and dynamic knowledge
Feedback-Driven Evolution CRITIC [100] Tool-assisted self-correction
STE [101] Slate iano o ol
SelfEvolve [102] Automated debugging and refinement

5. Experimental Setup

As a survey paper, it does not conduct its own experiments. Instead, Section 3 reviews the landscape of evaluation benchmarks and tools used by the community to assess LLM agents.

Fig. 3: An overview of evaluation benchmarks and tools for LLM agents. The left side shows various evaluation frameworks categorized by general assessment, domainspecific evaluation, and collaboratio… Image 3: This diagram organizes the evaluation ecosystem. The left side lists benchmarks for general, domain-specific, and collaborative assessment. The right side shows tools used by agents, created by agents, and used for deploying agents.

  • Datasets & Benchmarks: The paper categorizes evaluation frameworks into three types:

    1. General Assessment Frameworks: These benchmarks test a broad range of agent capabilities in multiple environments.
      • AgentBench: Evaluates agents across eight different interactive environments.
      • Mind2Web: Tests agents on real-world tasks across 137 websites.
      • MMAU: Breaks down agent intelligence into five core competencies using over 3,000 tasks.
      • VisualAgentBench: Focuses on multimodal agents that handle visual tasks.
      • BENCHAGENTS: A framework where LLM agents themselves help create new evaluation tasks, making the benchmark self-evolving.
    2. Domain-Specific Evaluation Systems: These are tailored to test agent performance in specialized fields.
      • Healthcare: MedAgentBench and AI Hospital simulate clinical tasks and workflows.
      • Autonomous Driving: LaMPilot evaluates agents on code generation for driving systems.
      • Data Science: DSEval and DA-Code cover the full data science lifecycle.
      • Security: AgentHarm assesses the risk of agents being used for malicious purposes.
      • Real-World Simulation: OSWorld provides a real computer environment (Ubuntu/Windows/macOS) for agents to perform tasks. EgoLife uses egocentric video to test agents on daily human activities.
    3. Collaborative Evaluation of Complex Systems: These benchmarks assess the performance of multi-agent systems.
      • TheAgentCompany: Simulates a software company to test collaborative coding and web interaction.
      • MLRB and MLE-Bench: Evaluate multi-agent teams on competitive machine learning research and engineering tasks (like Kaggle competitions).
  • Evaluation Metrics: The paper does not propose specific metrics. It notes that evaluation is moving beyond simple success-rate to more nuanced, multi-dimensional assessments of reasoning, planning, and adaptability, as captured by the diverse tasks in the benchmarks listed above.

  • Tools Ecosystem: The paper also surveys the tools that are integral to the agent ecosystem.

    • Tools used by LLM agents:
      • Knowledge Retrieval: Search engines like DuckDuckGo (ToolCoder) or commercial APIs (WebGPT).
      • Computation: Python interpreters (CodeActAgent) and calculators (Toolformer).
      • API Interactions: Tools to interact with external services via REST APIs (RestGPT).
    • Tools created by LLM agents: The provided text is incomplete and cuts off while introducing this topic. It suggests that agents can also create their own tools to handle novel tasks that existing tools cannot address.

6. Results & Analysis

The "results" of this survey are the synthesized insights and identified trends in the field of LLM agents.

  • Core Synthesis:

    • Architectural Convergence: The paper reveals that seemingly different agent systems are built from a common set of modular components (profile, memory, planning, action). The Build-Collaborate-Evolve framework provides a unified lens to understand these systems.
    • From Individual to Collective Intelligence: There is a clear trend moving from single-agent systems to multi-agent collaboration. The paper's analysis of centralized, decentralized, and hybrid architectures highlights the trade-offs between control and flexibility.
    • Performance Gaps in Real-World Scenarios: Domain-specific benchmarks reveal that general-purpose agents often struggle with the complexities and constraints of specialized fields like healthcare or autonomous driving. This highlights the need for more specialized agent development and evaluation.
    • The Importance of Evolution: The most advanced agents are not static; they learn and improve. The survey categorizes the key mechanisms for this evolution, from self-correction to competitive co-evolution, underscoring that continuous learning is crucial for building more capable agents.
  • Real-World Issues & Challenges: While the full section on this topic is not included in the provided text, the introduction, abstract, and diagrams point to critical challenges:

    • Security: Malicious use of agents (AgentHarm), agent-centric attacks (e.g., prompt injection), and data-centric threats.
    • Privacy: Agents might memorize and leak sensitive data from their training or interactions.
    • Social Impact & Ethics: The paper points to the need to consider the broader societal implications, including job displacement, ethical decision-making, and ensuring agent behavior aligns with human values.

7. Conclusion & Reflections

  • Conclusion Summary: The paper successfully provides a systematic and comprehensive survey of the LLM agent landscape. Its primary achievement is the Build-Collaborate-Evolve framework, which offers a structured, methodology-centered taxonomy for analyzing, comparing, and developing LLM agent systems. By unifying fragmented research, highlighting the ecosystem of tools and evaluations, and touching upon real-world challenges, the survey serves as a foundational guide for both newcomers and experienced researchers in this rapidly advancing field.

  • Limitations & Future Work:

    • Author-Acknowledged: The paper implicitly suggests future work in every sub-category, such as creating more robust memory systems, more adaptive collaboration protocols, and more efficient evolution mechanisms. A key direction is bridging the gap between generalist agents and the demands of specialized, real-world domains.
    • Inherent Limitation: The biggest limitation of any survey in such a fast-moving field is that it is a snapshot in time. New architectures and systems are published weekly. The authors attempt to mitigate this by providing a companion GitHub repository, which can be updated more frequently.
    • Incomplete Text: The provided markdown is incomplete, cutting off in the middle of Section 3.2.2. A full analysis of the paper would require the complete text, especially the sections on Applications and Real-World Issues.
  • Personal Insights & Critique:

    • Strength: The paper's main strength is its clarity and structure. The Build-Collaborate-Evolve framework is exceptionally intuitive and provides a powerful mental model for thinking about agent systems. It effectively transforms a chaotic collection of research papers into a well-organized map.
    • Value: This survey is highly valuable for anyone entering the field of LLM agents. It provides the necessary concepts, terminology, and key research examples to quickly get up to speed. For experts, it offers a way to contextualize their own work within the broader landscape.
    • Critique: While the taxonomy is excellent, the boundaries between categories can sometimes be blurry. For example, a system like AutoGen appears in multiple categories (profile definition, decentralized collaboration), which shows the interconnectedness but also the challenge of clean categorization. However, this is more a reflection of the field's complexity than a flaw in the paper. The survey's true long-term impact will depend on how well its proposed framework adapts to future innovations.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.