Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents
TL;DR Summary
The paper presents `Galaxy`, a cognition-centered framework for proactive, self-evolving, and privacy-preserving LLM agents, integrating cognitive architecture with system design. Experimental results demonstrate its superior performance across various benchmarks.
Abstract
Intelligent personal assistants (IPAs) such as Siri and Google Assistant are designed to enhance human capabilities and perform tasks on behalf of users. The emergence of LLM agents brings new opportunities for the development of IPAs. While responsive capabilities have been widely studied, proactive behaviors remain underexplored. Designing an IPA that is proactive, privacy-preserving, and capable of self-evolution remains a significant challenge. Designing such IPAs relies on the cognitive architecture of LLM agents. This work proposes Cognition Forest, a semantic structure designed to align cognitive modeling with system-level design. We unify cognitive architecture and system design into a self-reinforcing loop instead of treating them separately. Based on this principle, we present Galaxy, a framework that supports multidimensional interactions and personalized capability generation. Two cooperative agents are implemented based on Galaxy: KoRa, a cognition-enhanced generative agent that supports both responsive and proactive skills; and Kernel, a meta-cognition-based meta-agent that enables Galaxy's self-evolution and privacy preservation. Experimental results show that Galaxy outperforms multiple state-of-the-art benchmarks. Ablation studies and real-world interaction cases validate the effectiveness of Galaxy.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
Galaxy: A Cognition-Centered Framework for Proactive, Privacy-Preserving, and Self-Evolving LLM Agents
1.2. Authors
The authors are Chongyu , Ruimin Dai3*, Yangbo Shen1* Runyang Jian4, Jinghan Zhang3, Xiaolan , Kunpeng Liu3†. Their affiliations are:
-
Carnegie Mellon University (1)
-
University of Bristol (2)
-
Clemson University (3)
-
Portland State University (4)
Chongyu Bao is noted with a primary authorship indicator (*). Chongyu Bao, Xiaolan Liu, and Kunpeng Liu are noted with correspondence indicators (†).
1.3. Journal/Conference
The paper is published at arXiv, a preprint server for scientific papers. As of the provided information, it is a preprint (arXiv:2508.03991) and not yet formally published in a peer-reviewed journal or conference. arXiv is a widely recognized platform for disseminating research quickly, but preprints have not undergone formal peer review.
1.4. Publication Year
The publication timestamp (UTC) is 2025-08-06T00:46:38.000Z.
1.5. Abstract
This paper introduces Galaxy, a cognition-centered framework designed for Large Language Model (LLM) agents to function as Intelligent Personal Assistants (IPAs). While existing IPAs primarily focus on responsive capabilities, Galaxy addresses the underexplored aspects of proactive behaviors, privacy preservation, and self-evolution. The core innovation is Cognition Forest, a semantic structure that unifies cognitive modeling with system-level design, creating a self-reinforcing loop where cognitive architecture drives system design, and system improvements refine cognition. Galaxy supports multidimensional interactions and personalized capability generation. Within this framework, two cooperative agents are implemented: KoRa, a cognition-enhanced generative agent for responsive and proactive skills; and Kernel, a meta-cognition-based meta-agent responsible for Galaxy's self-evolution and privacy preservation. Experimental results demonstrate that Galaxy outperforms multiple state-of-the-art benchmarks, with ablation studies and real-world cases validating its effectiveness.
1.6. Original Source Link
- Original Source Link: https://arxiv.org/abs/2508.03991
- PDF Link: https://arxiv.org/pdf/2508.03991v1.pdf
- Publication Status: Preprint on arXiv.
2. Executive Summary
2.1. Background & Motivation
The paper addresses critical limitations in current Intelligent Personal Assistants (IPAs) like Siri and Google Assistant, especially with the advent of Large Language Model (LLM) agents.
-
Core Problem: Current
LLMagents, when deployed asIPAs, predominantly focus onresponsive behaviors(i.e., acting only upon explicit user commands). They largely lackproactive behaviors(acting without explicit commands), robustprivacy-preservingmechanisms, and the ability forself-evolution(continuously adapting and improving their own architecture and strategies). -
Importance of the Problem:
- Enhanced Human Capabilities:
IPAsare designed to augment human abilities and perform tasks, but their utility is constrained if they cannot anticipate needs or learn over time. Proactivity, privacy, and self-evolution are crucial for truly intelligent, helpful, and trustworthy assistants. - Underexplored Proactivity: While
LLMs have advanced reasoning and planning, leveraging these for proactive assistance, which requires deep user modeling and intent prediction, remains a significant challenge. - Privacy Risks: Proactive systems often require access to sensitive user data for modeling.
Cloud-based LLMinferences, in particular, exacerbate privacy concerns, making robust privacy preservation essential for user trust and adoption. - Fixed Cognitive Architectures: Existing
LLMagents are often constrained by predefined internal modules and reasoning pipelines. They struggle to inspect, revise, or evolve their own underlying system designs or cognitive architectures, limiting their adaptability and personalization. This separation of cognitive architecture and system design hinders continuous improvement.
- Enhanced Human Capabilities:
-
Paper's Entry Point / Innovative Idea: The paper proposes that the
cognitive architectureandsystem designofLLMagents should not be treated separately but rather unified into aself-reinforcing loop. The core innovative idea isCognition Forest, a semantic structure that explicitly links cognitive modeling with system-level design. This enablesLLMagents not only to understand what to do and how to do it but also how it is implemented, allowing for deeper reflection and self-modification.
2.2. Main Contributions / Findings
The paper makes several significant contributions to the field of LLM agents and IPAs:
- Cognition Forest: Proposal of
Cognition Forest, a novel tree-structured semantic mechanism. This structure fundamentally integrates cognitive architecture (how an agent thinks and understands) with system design (how the system is built and functions). This unification forms aself-reinforcing loop, where cognitive insights drive system modifications, and system improvements enrich the cognitive architecture. This addresses the challenge of fixed architectures. - Galaxy Framework: Development of
Galaxy, anLLMagent framework built uponCognition Forest.Galaxyis specifically designed to support proactive task execution (acting without explicit commands), privacy-preserving operation, and continuous adaptation (self-evolution). It offers multidimensional interaction modalities and can generate or aggregate new cognitive capabilities for personalized needs. - Collaborative Agents (KoRa and Kernel): Implementation of two cooperative agents within the
Galaxyframework:KoRa: Acognition-enhanced generative agentthat acts as a human-like assistant. It supports both responsive skills (handling explicit commands) and proactive skills (anticipating needs) by grounding its cognition-to-action pipeline in theCognition Forest, which helps mitigatepersona driftand improves consistency.Kernel: Ameta-cognition empowered meta-agent.Kerneloperates at a higher level, supervising and optimizing the entireGalaxyframework. It enables self-reflection on capability limitations, expands functionality based on user demands, and ensures privacy preservation through itsPrivacy Gatewhen interacting withcloud models.
- Empirical Validation:
Galaxysignificantly outperforms multiple state-of-the-art benchmarks in areas relevant to agent capabilities. Extensive ablation studies and real-world interaction cases further validate the effectiveness ofGalaxy's design principles and components, particularly the crucial roles ofKerneland theAnalysis Layermodules (AgendaandPersona). - Key Conclusion: The paper concludes by arguing that an
IPA's understanding of its users should not be static or constrained by a fixed cognitive architecture, but rather should continuously evolve through reflection on and refinement of its own system design. This integrated approach leads to more capable, adaptable, and trustworthyLLMagents.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To understand the Galaxy framework, it's essential to grasp several core concepts:
- Large Language Models (LLMs):
LLMsare advanced artificial intelligence models trained on vast amounts of text data, enabling them to understand, generate, and process human language. They can perform tasks like translation, summarization, question-answering, and even code generation. Examples includeGPT-4,Claude,Gemini,Qwen. - LLM Agents: An
LLM agentis anLLMaugmented with additional components that allow it to interact with its environment, perceive information, reason, plan actions, and execute them. Unlike a standaloneLLMthat merely generates text, anLLM agentcan act. Key components often include:- Memory: To store past interactions and observations.
- Planning Module: To break down complex tasks into smaller steps.
- Tool-Use Module: To call external tools (e.g., search engines, calendars, APIs) to perform specific actions.
- Reflection Module: To evaluate its own actions and plans for improvement.
- Intelligent Personal Assistants (IPAs): Software agents that assist users with various tasks and information retrieval. Examples include Siri, Alexa, and Google Assistant. They typically respond to voice commands or text input.
- Proactive vs. Responsive Behaviors:
- Responsive Behavior: The agent acts only when explicitly prompted or commanded by the user. Most current
IPAsandLLMagents primarily exhibit responsive behavior. - Proactive Behavior: The agent anticipates user needs or potential issues and initiates actions without explicit instructions. This requires deep user modeling, context awareness, and predictive capabilities. For example, suggesting a route change due to traffic before being asked.
- Responsive Behavior: The agent acts only when explicitly prompted or commanded by the user. Most current
- Cognitive Architecture: In the context of
LLMagents, acognitive architecturedefines the fundamental structure of an agent's "mind." It specifies the internal modules (e.g., memory, reasoning, planning, perception), how they interact, the types of information they process, and the reasoning processes they can perform. It's the blueprint of the agent's intelligence. - Metacognition: Often described as "cognition about cognition,"
metacognitionrefers to an agent's ability to monitor, understand, and control its own cognitive processes. ForLLMagents, this means an agent can reflect on its own reasoning, identify limitations, and potentially adjust its internal strategies or even its cognitive architecture. - Generative Agents: A type of
LLM agent(e.g.,Generative Agentsby Park et al., 2023) designed to simulate human-like behavior, including memory streams, reflection, and planning, to create consistent and believable actions in interactive environments. They generate responses and actions based on their internal state and perceived environment. - Privacy Preservation (Data Masking/Anonymization): Techniques used to protect sensitive user information while still allowing data to be used for analysis or
LLMinference.Maskinginvolves replacing sensitive data with generic placeholders or aggregated values.Anonymizationaims to remove personally identifiable information entirely. The goal is to minimizeprivacy leakage. - Self-Evolution (Continuous Adaptation/Self-Improvement): The capacity of an
LLM agentto continually learn, adapt, and improve its own internal architecture, capabilities, and interaction strategies over time, often in response to new experiences, user feedback, or detected limitations. This goes beyond just learning new facts; it involves modifying the system itself. - Persona Drift: In
generative agents,persona driftrefers to the phenomenon where an agent's simulated personality, characteristics, or consistent behavior patterns gradually change or deviate from its intended or establishedpersonaover long periods of interaction, often due to an accumulation of varied experiences or insufficient grounding.
3.2. Previous Works
The paper discusses several categories of prior LLM agent research and specific systems, highlighting their advancements and limitations:
- Conversational Agents (e.g., Wahde and Virgolin 2022; Guan et al. 2025; Jiang et al. 2021): These agents primarily interact through dialogue and execute tasks by calling external tools. They focus on understanding natural language intent and responding appropriately within a conversational context.
- Limitation: While good for dialogue, they typically lack proactive capabilities that initiate actions without explicit commands.
- Autonomous Agents (e.g., Chen et al. 2025; Belle et al. 2025): These agents operate within specific environments (e.g., simulated worlds or specific applications) and often focus on executing single, well-defined tasks autonomously. They demonstrate strong task-planning and tool-invoking capabilities.
- Limitation: Primarily focused on
responsive behaviorswithin their environment; limited support for proactive skills or continuous adaptation of their own architecture.
- Limitation: Primarily focused on
- Multi-agent Systems (e.g., Zhou et al. 2024; Chen et al. 2024): These systems involve multiple
LLMagents collaborating to divide and conquer complex tasks, leveraging specialized roles to improve scalability and efficiency.MetaGPT (Zhou et al. 2024)is an example that produces efficient solutions through multi-agent collaboration.- Limitation: While they advance collaborative capabilities, the focus is still largely on
responsive behaviorsin task execution, with limited attention to proactivity, privacy, or self-evolution of the overall system.
- Limitation: While they advance collaborative capabilities, the focus is still largely on
LLMPA (Guan et al. 2023): AnLLM-based process automation system embedded in mobile applications to complete complex operations under natural language instructions.- Context: Showcases
LLMs in practical mobile task automation.
- Context: Showcases
WebPilot (Zhang et al. 2025b)andMind2web (Deng et al. 2023): These areGUI agentsdesigned to perform multi-step interactions on arbitrary websites, demonstrating advanced web automation capabilities.- Context: Highlight the ability of
LLMagents to interact with graphical user interfaces.
- Context: Highlight the ability of
- Proactive Agents (e.g., Liao, Yang, and Shah 2023; Lu et al. 2024): Some works have started to explore inferring user intent to enable proactive features.
(Liao, Yang, and Shah 2023)can infer user intent but remains confined to dialog interactions. emphasizes multi-source perception for deep user modeling needed for intent prediction.- Limitation: These efforts are nascent and generally do not address the combined challenges of directly triggering concrete operations, privacy risks, and self-evolution.
- Privacy in
LLMAgents (e.g., Gan et al. 2024; Hahm et al. 2025; Zeng et al. 2024): Research exists on security, privacy, and ethical threats inLLM-based agents (Gan et al. 2024), enhancing safety via causal influence prompting (Hahm et al. 2025), and privacy-preserving inference (Zeng et al. 2024).- Context: Acknowledges the importance of privacy, but often treated as a separate challenge rather than integrated into a holistic agent design.
- Self-Evolution and Metacognition (e.g., Li, Zhang, and Sun 2023; Liu and van der Schaar 2025; YanfangZhou et al. 2025; Hu, Lu, and Clune 2024; Yin et al. 2024):
Generative Agents (Park et al. 2023): Usesmemory stream,reflection, andplanningto simulate consistent human-like behavior. This is a foundational work forKoRa's architecture.Metaagent-P (YanfangZhou et al. 2025): Improves performance by reflecting on current workflows.Meta-agentsthat inspect and revise their own code (Li, Zhang, and Sun 2023; Liu and van der Schaar 2025).Automated designof agentic systems (Hu, Lu, and Clune 2024; Yin et al. 2024) that generate stronger modules or new agents.- Limitation: These studies often lack integration with task context or system constraints. The depth of
metacognitive abilityis often constrained by the underlying fixed cognitive architecture. Automated design relies on preset evaluation standards, struggling with sustained, open-ended evolution.
3.3. Technological Evolution
The evolution of intelligent assistants has progressed from rule-based systems to more sophisticated AIs.
-
Early
IPAs(e.g., pre-LLMSiri/Alexa): Primarily relied on predefined rules, scripts, and limited natural language processing capabilities. They were good at executing specific commands but lacked deeper understanding, context retention, and adaptability. -
Rise of
LLMs: The development oftransformerarchitectures and large-scale pre-training revolutionized natural language understanding and generation.LLMs gained impressive causal reasoning and task-planning abilities. -
LLM-based Agents: This integratedLLMs with components like memory, planning, and tool-use, transforming them into agents capable of interacting with environments and performing multi-step tasks. This represents a significant leap from staticLLMs to dynamic, action-oriented systems. Research then branched intoconversational,autonomous, andmulti-agentsystems. -
Current Frontier (Proactivity, Privacy, Self-Evolution): The field is now moving towards agents that are not just reactive but
proactive, not just capable butprivacy-preserving, and not just static butself-evolving. This requires moving beyond fixed cognitive architectures to systems that can inspect and modify their own design.Galaxyfits into this timeline by pushing the boundaries ofLLMagents into the frontier stage. It directly addresses the shortcomings of previousLLMagent designs by proposing a unified framework for proactivity, privacy, and self-evolution, enabled by a novel integration of cognitive architecture and system design.
3.4. Differentiation Analysis
Compared to prior research, Galaxy's core differences and innovations are:
-
Unified Cognitive Architecture and System Design: The most significant innovation is the
Cognition Forest, which explicitly unifies an agent'scognitive architecture(its internal understanding and reasoning) with its underlyingsystem design(its code and functional implementation). Previous works typically treat these as separate, leading to fixed architectures wheremetacognitionis limited to improving reasoning within the given architecture, not modifying the architecture itself.Galaxycreates aself-reinforcing loopfor alternating optimization. -
Holistic Approach to Proactivity, Privacy, and Self-Evolution: Unlike most existing works that focus on one or two of these aspects,
Galaxyis designed from the ground up to jointly address all three.- Proactivity: Achieved through the
Analysis Layer(Agenda,Persona) for deep user modeling andKoRa'sCognition-Action Pipelinegrounded inCognition Forest. - Privacy Preservation: Managed by
Kernel'sPrivacy Gatewith multi-level masking, specifically designed forcloud-based LLMinferences. - Self-Evolution: Enabled by
Kernel'smeta-agentcapabilities, which can inspect, adapt, and extendSpacesand even core system structures based on user needs and observed limitations, moving beyond preset evaluation standards.
- Proactivity: Achieved through the
-
Cognition-Enhanced Generative Agent (
KoRa): WhileKoRabuilds ongenerative agentarchitectures, it uses theCognition Forestto provide long-horizon semantic constraints, which helps mitigate issues likepersona driftand improves the consistency of behaviors, especially in long-term proactive assistance. -
Framework-Level Meta-Agent (
Kernel):Kernelacts as ameta-agentthat operates at theframework-level. It not only overseesKoRa's cognitive execution but also inspects and adapts the underlying system structures (Spaces,Cognition Forestitself). This givesGalaxya deeper capacity for self-improvement than priormeta-agentsthat might only reflect on workflows or generate code within fixed architectural constraints. -
Multidimensional Interactions (
Spaces):Galaxyextends interaction modalities beyond chat windows throughSpaces, which are cognitively accessible and interactable modules. TheseSpacescan be customized or auto-generated, further enhancing personalization and the system's perceptual scope, a deeper integration than simpletool invocation.In essence,
Galaxydifferentiates itself by introducing a unified, adaptable "cognitive blueprint" that allows theLLM agentto intelligently perceive, understand, act proactively, protect privacy, and fundamentally re-engineer itself in response to user needs and operational experience.
4. Methodology
4.1. Principles
The core principle behind Galaxy is the unification of cognitive architecture and system design into a self-reinforcing loop. Instead of treating these as separate entities, Galaxy posits that an LLM agent's understanding (cognition) should directly inform and evolve its underlying structure (system design), and in turn, enhancements in the system design should enrich and refine its cognitive capabilities. This principle is embodied in the Cognition Forest, a semantic structure that provides a comprehensive, hierarchical cognitive context for the agents (KoRa and Kernel) while also integrating the design principles and actual code for reuse and modification. This allows LLM agents to not only know what to do and how to do it, but also to understand how it is implemented, enabling deeper metacognition and self-evolution.
4.2. Core Methodology In-depth (Layer by Layer)
The Galaxy framework operates on a Perception-Analysis-Execution paradigm, supported by a central Cognition Forest and two specialized LLM agents, KoRa and Kernel. Figure 1 provides an overview of this framework.
4.2.1. Overall Framework (Figure 1)
As illustrated in Figure 1, Galaxy integrates user interactions via Spaces and Chat Window. The core components are the Cognition Forest, KoRa, and Kernel, supported by Interaction Layer, Analysis Layer, and Execution Layer.
The Interaction Layer perceives user interaction states and contextual signals. The Analysis Layer stores and organizes user data, conducting short-term and long-term user modeling. The Execution Layer generates plans, schedules tasks, and executes actions.
该图像是对 Galaxy 框架的示意图,展示了 Cognition Forest 的构成以及与 LLM 代理 KoRa 和 Kernel 的关系。左侧展示了不同的模块和功能,包括用户信息和执行、分析、互动层。右侧则说明了 LLM 代理如何进行分析和行动的机制。
Figure 1: Framework of proposed Galaxy IPA.
Key modules and their roles:
Cognition Forest: The framework's unified cognitive and metacognitive architecture, structured as multiple semantic subtrees. It providesKoRaandKernelwith comprehensive, hierarchical cognitive context.Spaces: Cognition-driven personalized interaction modules that capture multi-dimensional information during user interactions.Agenda: Models user behavior from perceived event information, generating scheduling recommendations to guideKoRa's autonomous actions.Persona: Performs comprehensive, long-term modeling of user characteristics to supportKoRa's delegated decision-making.KoRa: Acognition-enhanced generative agentcapable of proactive delegation without explicit instructions, or efficient execution when instructed.Kernel: Ametacognition-empowered meta agentoperating outside the three main layers. It's responsible for maintaining system stability, safeguarding privacy, and enablingGalaxy's evolution.
4.2.2. Cognition Forest
The Cognition Forest () is a fundamental semantic structure proposed to unify different cognition dimensions with their underlying system designs.
Definition: Cognition Forest is a structured forest consisting of four subtrees:
$
\mathcal{F} = { \mathcal{T}{\mathrm{user}}, \mathcal{T}{\mathrm{self}}, \mathcal{T}{\mathrm{env}}, \mathcal{T}{\mathrm{meta}} }
$
Where:
-
: Represents personalized modeling of the user, maintained by
Persona. -
: Describes
Galaxyitself, its internal agents likeKoRa, and their roles and capabilities. -
: Represents the operational environment, including perceivable
Spacemodules and system tools. -
: Represents the system's
metacognition, such as execution pipelines.Uniqueness: The key differentiator of
Cognition Forestis its association of each cognitive element with its corresponding system design. This meansLLMagents not only understand what to do (semantic understanding) and how to do it (mapped system function) but also how it is implemented (concrete implementation code). This deepens the framework'smetacognitionbeyond traditional cognitive architectures.
Node Structure: Each node within these subtrees is represented by three dimensions:
-
Semantic: The
LLM's semantic understanding or natural language meaning of the concept. -
Function: The corresponding system function or callable element mapped to the semantic meaning.
-
Design: The concrete implementation code or design principles for that function.
Example: For a
write_textnode in aMemo Space:
-
Semantic: "writing new content to memo"
-
Function:
write_text() -
Design: The actual implementation code for
write_text().This design allows
Kernelto reflect on execution failures (e.g., incorrect sequence, implementation errors) and perform deeper modifications, as it understands the design alongside the function and semantics.
4.2.3. Sensing and Interaction Protocol (Spaces)
Spaces are a protocol designed to encapsulate heterogeneous information sources into unified, cognitively accessible, and interactable modules. They serve as the system's extensible Interaction Layer.
Objective: To overcome the limitation of most IPAs where cognitive architectures are constrained by underlying system design, and extensibility at the Interaction Layer is limited, hindering personalization.
Approach: Galaxy treats each Space function as a local execution container and an independent subtree within Cognition Forest. Spaces can be user-customized or automatically generated, expanding both the system's perceptual scope and interactive capabilities.
Each Space consists of the following components:
-
Perception Window: Continuously observes user actions and environmental signals. It converts raw inputs into structured
TimeEvententries and state snapshots, unifying them into a consistent, temporally grounded context for theAnalysis Layer. -
Interaction Component: Can act as a standalone, personalized module providing a user interface and interaction nodes accessible to both the user and
KoRa. -
Cognitive Protocol: Provides a unified development and integration standard for all
Spaces. It specifies how high-level intents are translated into concrete system operations, ensuring eachSpacecan be consistently embedded into theCognition Forestfor reasoning and task execution.Unlike simple
LLM agenttool generation,Galaxy'sSpacesare deeply embedded within the system's cognition, functioning as integral "organs" rather than detachable tools.
4.2.4. User Behavior and Cognitive Modeling (Agenda & Persona)
The Analysis Layer is responsible for modeling user behavior and preferences to support proactive skills. It comprises Agenda and Persona.
4.2.4.1. Agenda
Agenda models explicit schedules and implicit behavioral patterns to anticipate and interpret upcoming events.
-
TimeEvent:
Agendauses a unifiedTimeEventto represent two types of events:Schedule: Denotes explicit user schedules (e.g., "class at 18:30 on June 18").Behavior: Denotes observed operational actions (e.g., "translated documents in thechat_windowin the morning").
-
Schedule Draft: The
Interaction Layerextracts event content and time ranges, writing them toSchedule Draft. Uncertain or conflicting events are routed to an alignment queue for resolution. -
Behavior Patterns: All
TimeEvententries are retained for long-term behavior modeling. Eachbehavioris represented as a structured triple:(time, tool, semantic intent).Galaxyclusters these behaviors alongtoolandsemanticdimensions to identify recurringBehavior Patterns. -
Daily Plan Generation: Based on the user's
schedule,Agendadrafts an initial plan, suggesting relevantBehavior Patternsfor open time slots. This proposed daily plan is shared with the user for confirmation. Once approved, a summary of next-day actions is passed toKoRafor timely assistance.Figure 2 illustrates how observed behavior patterns inform the
AgendaandPersona.
该图像是一个示意图,展示了Galaxy框架的用户界面和两个智能代理KoRa与Kernel的交互。图中包含了用户的日常偏好、计划和行为模式,还展示了KoRa的主动与响应模式,以及操作空间的设计元素。
Figure 2: The user interface of the Galaxy framework and the interactions of two intelligent agents, KoRa and Kernel. It includes the user's daily preferences, plans, and behavior patterns, while also showcasing KoRa's proactive and responsive modes, along with the design elements of operational spaces.
4.2.4.2. Persona
Persona maintains a growing User Cognition Tree (), which is a subtree of the Cognition Forest ().
- User Insights:
GalaxyusesLLMs to aggregate dialogues andSpaceinteractions into user insights. Each insight contains a natural language summary and a semantic embedding. These are high-level semantic cognitions, not just statistical aggregates. - Node Management:
- Similar insights accumulating beyond a threshold are promoted to a
long-term node. - Insights similar to an existing node are merged, and the node's timestamp is refreshed.
- Nodes unused for a long period decay and are removed.
- Stable identity information (e.g., name, phone number) is inserted into an
identity branchupon first discovery.
- Similar insights accumulating beyond a threshold are promoted to a
4.2.5. KoRa: Intelligent Butler for User
KoRa is the cognition-enhanced generative agent responsible for direct user interaction, proactive schedule management, and real-time request handling.
Objective: To enable proactive schedule management and real-time request handling while maintaining task consistency, avoiding conflicting operations (e.g., pre-scheduled vs. manual booking), and mitigating persona drift.
Approach:
-
Generative Agent Architecture:
KoRaadopts agenerative agentarchitecture (similar to Park et al., 2023) withmemory stream,planning, andreflectionmodules. -
Structured State Stack: To handle interruptions and resume execution in responsive mode,
KoRauses astructured state stackinstead of a simple memory stream. This stack recordstask type,source, andexecution details. -
Execution Flow:
KoRafollows a top-down execution flow to advance tasks from the daily plan generated byAgendain theAnalysis Layer. -
Cognition-Action Pipeline: To address
personality forgettingandbehavior drift,KoRaintegrates a cognitive architecture grounded inCognition Forest. This hierarchical semantic space supportsintent parsing,semantic routing, and the construction ofbehavior chains.KoRa's operatingCognition Forestsubset, , includes components relevant to its tasks: $ \mathcal{F}^{\mathrm{KoRa}^{\bullet}} = { \mathcal{T}{\mathrm{user}}, \mathcal{T}{\mathrm{self}}^{\mathrm{KoRa}}, \mathcal{T}{\mathrm{env}}^{\mathrm{KoRa}}, \mathcal{T}{\mathrm{dialogue}} } $ Where: -
: The
User Cognition Treemaintained byPersona. -
: Represents
KoRa's specific capabilities and role withinGalaxy. -
: Includes any callable elements within
Spacesrelevant toKoRa's tasks. -
: Collects fallback or vague-intent utterances, serving as the default entry point for open-ended interactions.
As illustrated in Figure 3,
KoRa'sCognitive-Action Pipelineprocesses an intent through three main stages:
该图像是KoRa的执行流程示意图。图中展示了用户意图解析与Cognition Forest的结合过程,KoRa依据用户的身份、关系及语言习惯进行推理,生成邮件内容并组装执行链。当关键信息缺失时,执行会被暂停以完成对齐。
Figure 3: Execution pipeline of KoRa. The user's intent or KoRa's plan is parsed and grounded through the Cognition Forest. KoRa extracts relevant semantic paths, performs reasoning, generates contextual content, and assembles an execution chain. If essential information is missing, execution is suspended until alignment is completed.
-
Semantic Routing:
KoRafirst locates relevantcognitive paths(e.g.,["env", "user", "self"]) by traversing theCognition Forestand selecting branches that semantically align with the intent . -
Forest Retrieval: For each identified path,
KoRaretrieves supporting nodes from the corresponding subtree based on contextual cues, lexical similarity, or inferred relevance. These nodes provide theSemantic,Function, andDesigninformation needed. -
Action Chain Construction: Guided by the retrieved content,
KoRaassembles a structuredAction Chain. This chain comprises discrete operations such as generating content, aligning intent, invoking system functions (e.g.,send_email(address, content)), and composing natural language responses.Missing Information Handling: If any required information is missing (e.g., incomplete parameters for a function, failed node retrieval),
KoRasuspends the current chain. It then interacts with the user in natural language to align the missing information before resuming execution.KoRausescloud-based LLM inferencefor its operations, andKernelensures privacy isolation for this.
4.2.6. Kernel: Framework-Level Meta Agent
Kernel is the metacognition-empowered meta agent responsible for the overarching stability, privacy, and self-evolution of the Galaxy framework. It operates outside the Perception-Analysis-Execution layers.
Objective: To ensure robustness in LLM-based reasoning, addressing privacy concerns with cloud-based inference and mitigating hallucinations from lightweight local models. It provides recovery mechanisms and self-monitoring for self-evolution.
Approach: Kernel uses the MetaCognition Tree () to monitor internal reasoning and catch potential execution failures. Unlike most systems, Kernel can revise reasoning flows when the cognitive architecture itself becomes a bottleneck. It is implemented as a meta agent with the ability to reason across both functional logic and architectural dependencies, enabling targeted adjustments to system configurations.
Kernel operates through three principal mechanisms:
-
Oversee:
Kernelcontinuously monitorsGalaxy's execution pipelines, includingLLMcalls across all three layers (Interaction,Analysis,Execution) andKoRa's task behavior. Upon detecting abnormal patterns, it triggersmeta-reflectionand executes predefined failure-handling routines to ensure stable system operation. -
User-Adaptive System Design:
Kernelidentifies latent user needs based on long-term behavioral trends (fromAnalysis Layer), confirms them through lightweight user alignment, and then modifies or extends relevantSpacesaccordingly. It functions as a minimal, self-contained control unit with alocal code interpreterandrule engine, allowing self-checks and recovery even offline. This directly leverages theDesigndimension ofCognition Forestnodes. -
Contextual Privacy Management:
Kernelmaintains anAutonomous Avataraligned with theUser Cognition Tree() to represent user context. It regulates data exposure through anLLM-basedPrivacy Gate, as shown in Figure 4.
该图像是示意图,展示了隐私门的工作流程,定义了四个级别的屏蔽(L1至L4),更高的级别在更多属性上应用更严格的匿名化。图中显示了真实用户信息和虚拟角色档案之间的关系,经过隐私门处理后的信息在L3级别上进行转换。
Figure 4: Workflow of Privacy Gate. Privacy Gate defines four levels of masking (L1L4), where higher levels apply stricter anonymization across more attributes.
Privacy Gate Workflow: Before transmitting data to the cloud, Privacy Gate applies masking to safeguard sensitive content while preserving task-relevant information. After receiving results from the cloud LLM, Kernel selectively demasks data to restore the necessary context for downstream use. Privacy Gate defines four levels of masking (L1-L4), where higher levels apply stricter anonymization across more attributes. This contextual approach ensures that privacy protection is dynamically adjusted based on sensitivity and task requirements.
4.2.7. From Cognitive Architecture to System Design, and Back Again
This section reiterates the core philosophy of Galaxy, outlining the closed-loop mechanism of alternating optimization between Cognition Forest and system design:
-
Cognition drives understanding:
Galaxyinterprets user needs and intentions by grounding its understanding in itscognitive architecture(e.g.,Cognition Forest). -
Cognition triggers reflection:
Galaxyassesses whether its current framework capabilities adequately address user needs and identifies unmet requirements. This is whereKernel'smetacognitioncomes into play, utilizing theDesigndimension ofCognition Forestnodes. -
Reflection guides system design:
Galaxytranslates these unmet needs into new system design goals and autonomously improves system capabilities (e.e., generating newSpaces, modifying existing code). This modification directly impacts theDesigndimension ofCognition Forestnodes. -
Design reinforces cognition: Newly introduced or modified system structures (e.g., new
Spacemodules, refined execution pipelines) create additionalcognitive pathwaysandsensing capabilities. These, in turn, strengthen and optimize the originalcognitive architectureitself (e.g., by adding new nodes toCognition Forestor refining existing ones).This loop highlights the
co-constructivenature of cognitive architecture and system design inGalaxy, enabling continuousself-evolutionguided by user needs.
5. Experimental Setup
5.1. Datasets
To evaluate the comprehensive capabilities of the Galaxy framework, the authors employ three public benchmarks: AgentBoard, PrefEval, and PrivacyLens.
-
AgentBoard (Ma et al. 2024):
- Description: This benchmark uses six types of tasks to simulate a multi-round interactive environment for
LLM agents. It aims to assess an agent's ability to handle complex interactions and achieve specific goals over multiple steps. - Characteristics: Focuses on the completion rate of an entire behavioral chain, simulating real-world interactive scenarios.
- Why chosen: To validate
Galaxy's general performance in complex, multi-turn interactive environments.
- Description: This benchmark uses six types of tasks to simulate a multi-round interactive environment for
-
PrefEval (Zhao et al. 2025):
- Description: This benchmark specifically evaluates whether
LLM agentscan maintain user preferences consistently throughout long conversations. It assesses the agent's ability to remember and apply personalized preferences without explicit re-statement. - Characteristics: It measures preference retention accuracy in two settings:
Zero-Shot(without reminding users of their preferences) andReminder(by reminding users of their preferences). This tests the long-term memory and consistency of user modeling. - Why chosen: To validate
Galaxy's capabilities inlong-term user modelingandpersonalized preference retention, which is crucial for proactive assistance.
- Description: This benchmark specifically evaluates whether
-
PrivacyLens (Shao et al. 2025):
- Description: This benchmark measures the ability of
LLM agentsto understand and adhere to privacy norms when performing real-world tasks. It evaluates how well agents protect sensitive user information during operations. - Characteristics: It comprehensively evaluates privacy protection using metrics like helpfulness, privacy leakage rate, and accuracy. This involves understanding sensitive information and applying appropriate safeguards.
- Why chosen: To specifically validate
Galaxy'sprivacy-preservingmechanisms, particularly thePrivacy Gatemanaged byKernel.
- Description: This benchmark measures the ability of
5.2. Evaluation Metrics
For each benchmark, specific metrics are used to quantify performance.
-
For AgentBoard:
- Conceptual Definition:
Target Achievement Rate(TAR) measures the percentage of tasks where theLLM agentsuccessfully completes the target goal across the entire multi-round interactive behavior chain. It assesses the agent's ability to execute complex, multi-step plans correctly. - Mathematical Formula: While the paper does not provide an explicit formula for
Target Achievement Rate, it is generally defined as: $ \mathrm{TAR} = \frac{\text{Number of successfully completed tasks}}{\text{Total number of tasks}} \times 100% $ - Symbol Explanation:
Number of successfully completed tasks: The count of tasks where the agent reached the defined target state or outcome.Total number of tasks: The total number of tasks attempted by the agent.
- Conceptual Definition:
-
For PrefEval:
- Conceptual Definition:
Preference Retention Accuracymeasures how accurately anLLM agentremembers and applies a user's stated preferences over multi-round conversations. This metric is crucial for personalization. It's evaluated in two modes:Zero-Shot (Z): The agent is not reminded of the user's preferences in subsequent turns. It must recall them from its memory/modeling.Reminder (R): The user's preferences are explicitly reminded to the agent in subsequent turns, testing its ability to consistently apply them when recalled.
- Mathematical Formula: The paper presents results for different numbers of conversations (e.g., 10 and 300 rounds). The accuracy for a given number of rounds would be: $ \mathrm{Accuracy} = \frac{\text{Number of turns where preferences were correctly applied}}{\text{Total number of turns where preferences were relevant}} \times 100% $
- Symbol Explanation:
Number of turns where preferences were correctly applied: The count of conversational turns where the agent's response or action correctly reflected the user's previously stated preferences.Total number of turns where preferences were relevant: The total number of conversational turns where user preferences were applicable and should have been considered by the agent.
- Conceptual Definition:
-
For PrivacyLens:
- Conceptual Definition:
PrivacyLensuses three metrics to comprehensively evaluate privacy protection:- Helpfulness (Help.): Measures the quality and utility of the agent's output from the user's perspective, ensuring that privacy measures do not overly degrade task performance. A higher value indicates better user satisfaction.
- Privacy Leakage Rate (LR / LRh): Quantifies the percentage of sensitive information that is inadvertently exposed or inferable from the agent's output.
LRlikely refers to a general leakage rate, whileLRhmight be a variant like leakage rate for highly sensitive information. A lower value indicates better privacy protection. - Accuracy (Acc.%): Measures the correctness of the agent's task execution, specifically in the context of privacy-sensitive tasks. It indicates whether the agent completed the task successfully while adhering to privacy norms.
- Mathematical Formula:
- Helpfulness: This is often a subjective metric, typically measured via human evaluation (e.g., Likert scale scores) or
LLM-based evaluators. If using a scale (e.g., 1-5), the formula might be: $ \mathrm{Helpfulness} = \frac{\sum_{i=1}^{N} \mathrm{Score}_i}{N} $ Where is the number of evaluations, and is the helpfulness score for evaluation . - Privacy Leakage Rate: $ \mathrm{Privacy , Leakage , Rate} = \frac{\text{Number of leaked sensitive items}}{\text{Total number of sensitive items present}} \times 100% $
- Accuracy: $ \mathrm{Accuracy} = \frac{\text{Number of correctly completed privacy-sensitive tasks}}{\text{Total number of privacy-sensitive tasks}} \times 100% $
- Helpfulness: This is often a subjective metric, typically measured via human evaluation (e.g., Likert scale scores) or
- Symbol Explanation:
- : The helpfulness score given for a specific agent interaction or output.
Number of leaked sensitive items: The count of individual pieces of sensitive information that were exposed (e.g., name, address, phone number).Total number of sensitive items present: The total count of all sensitive pieces of information that could potentially be leaked in the context.Number of correctly completed privacy-sensitive tasks: The count of tasks that were executed successfully without compromising privacy.Total number of privacy-sensitive tasks: The total count of tasks that involved sensitive information and required privacy consideration.
- Conceptual Definition:
5.3. Baselines
The paper compares Galaxy against several state-of-the-art LLM agents from major providers. These baselines represent leading performance in various LLM agent capabilities. The performance of Galaxy without Kernel (Galaxy (w/o Kernel)) is also included as an ablation baseline to specifically highlight Kernel's contribution.
The compared LLM agents are:
-
GPT-4o(OpenAI) -
GPT-01-pro(OpenAI, likely a typo forGPT-4-0125-previewor a similar commercial model) -
Claude-Opus-4(Anthropic, likely a typo forClaude 3 Opus) -
Claude-Sonnet-4(Anthropic, likely a typo forClaude 3 Sonnet) -
Deepseek-Chat(Deepseek AI) -
Deepseek-Reasoner(Deepseek AI) -
Gemini-2.0-Flash(Google, likely a typo forGemini 1.5 Flash) -
Gemini-2.5-Flash(Google, likely a typo forGemini 1.5 Flashor a similar next-gen model) -
Qwen-Max(Alibaba Cloud) -
Qwen3(Alibaba Cloud, likelyQwen2)For
Galaxy's configuration: -
Local model within
Kernel:Qwen2.5-14B -
Cloud-based model in
KoRa:GPT-4o-miniExperiments were run on an
M3 Max platform with macOS, and average results over 100 trials are reported.
6. Results & Analysis
6.1. Core Results Analysis
The experimental results demonstrate that Galaxy significantly outperforms existing LLM agents and Galaxy (w/o Kernel) across multiple benchmarks, particularly highlighting the crucial role of the Kernel meta-agent in preference retention and privacy protection.
The following are the results from Table 1 of the original paper:
| LLM Agents | AgentBoard | PrefEval | PrivacyLens | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ALF | SW | BA | JC | PL | TQ | Z10 | R10 | Z300 | R300 | Acc.% | LR | LRh | Help. | |
| GPT-40 | 54.5 | 19.7 | 67.5 | 99.4 | 85.1 | 99.2 | 7.0 | 98.0 | 0.0 | 78.0 | 97.0 | 50.5 | 51.0 | 2.71 |
| GPT-01-pro | 87.2 | 39.0 | 90.2 | 99.6 | 95.7 | 96.3 | 37.0 | 98.0 | 7.0 | 98.0 | 92.0 | 52.5 | 53.0 | 2.83 |
| Claude-Opus-4 | 86.2 | 38.5 | 92.5 | 99.8 | 95.7 | 99.5 | 3.0 | 98.0 | 1.0 | 87.0 | 97.5 | 38.5 | 39.0 | 2.73 |
| Claude-Sonnet-4 | 77.1 | 38.2 | 92.2 | 99.8 | 98.6 | 99.0 | 14.0 | 96.0 | 1.0 | 85.0 | 98.0 | 24.0 | 24.5 | 2.73 |
| Deepseek-Chat | 17.5 | 9.8 | 55.4 | 99.2 | 41.7 | 95.3 | 1.0 | 92.0 | 0.0 | 73.0 | 89.5 | 53.5 | 54.5 | 2.52 |
| Deepseek-Reasoner | 42.0 | 27.9 | 81.6 | 99.6 | 63.9 | 98.1 | 83.0 | 85.0 | 83.0 | 80.0 | 86.0 | 55.0 | 57.5 | 2.66 |
| Gemini-2.0-Flash | 42.1 | 13.6 | 77.5 | 90.8 | 20.4 | 99.1 | 10.0 | 98.0 | 8.0 | 91.0 | 91.0 | 52.0 | 52.5 | 2.57 |
| Gemini-2.5-Flash | 50.2 | 14.3 | 84.1 | 95.1 | 43.3 | 97.8 | 91.0 | 92.0 | 89.0 | 92.0 | 96.0 | 53.5 | 55.0 | 2.59 |
| Qwen-Max | 78.1 | 22.3 | 83.7 | 99.6 | 80.8 | 99.8 | 5.0 | 98.0 | 1.0 | 83.0 | 91.5 | 56.0 | 57.0 | 2.55 |
| Qwen3 | 71.3 | 32.7 | 85.4 | 90.6 | 83.3 | 86.2 | 7.0 | 94.0 | 0.0 | 69.0 | 94.0 | 38.0 | 39.0 | 2.58 |
| Galaxy(w/o Kernel) | 88.4 | 39.1 | 93.1 | 99.9 | 99.3 | 99.7 | 17.0 | 96.0 | 11.0 | 96.0 | 97.0 | 50.5 | 51.0 | 2.71 |
| Galaxy | 88.4 | 39.1 | 93.1 | 99.9 | 99.3 | 99.9 | 96.0 | 96.0 | 94.0 | 998.0 | 99.0 | 18.5 | 19.0 | 2.74 |
Analysis of Benchmark Results (Table 1):
- Overall Superiority: Both
GalaxyandGalaxy (w/o Kernel)demonstrate strong performance, outperforming most existingLLM agentsacross a majority of metrics. For instance, onAgentBoard,Galaxyachieves top scores (88.4 inALF, 39.1 inSW, 93.1 inBA, 99.9 inJC, 99.3 inPL, 99.9 inTQ), indicating robust capabilities in multi-round interactive environments. - Impact of
KernelonPrefEval(Preference Retention):Galaxy (w/o Kernel)shows limited preference retention, especially inZero-Shotconditions (Z10at 17.0% andZ300at 11.0%). This impliesKoRaalone, even with its generative agent architecture, struggles to consistently recall and apply user preferences over long interactions without explicit reminders or theKernel's oversight.- With
Kernelenabled,Galaxy's preference retention dramatically improves:Z10jumps from 17.0% to 96.0%, andZ300from 11.0% to 94.0%. TheR300score forGalaxyis listed as 998.0, which is likely a typo and should be interpreted as near 99.8 or 98.0 given the context, still indicating very high performance. This highlightsKernel's critical role in maintaining an evolvingCognition Forestand supporting long-term personalized planning.
- Impact of
KernelonPrivacyLens(Privacy Protection):Galaxy (w/o Kernel)has a privacy leakage rate (LR) of 50.5% andLRhof 51.0%, comparable toGPT-4o, indicating a significant amount of sensitive information leakage when operating withoutKernel's explicit privacy controls.Galaxy(withKernel) drastically reduces the privacy leakage rate:LRdrops from 50.5% to 18.5%, andLRhdrops from 51.0% to 19.0%. This confirmsKernel's effectiveness in enforcing privacy through thePrivacy Gate, which masks sensitive content beforecloud transmission.- Helpfulness (
Help.) remains consistently high (2.71 for w/oKernel, 2.74 forGalaxy), suggesting thatKernel's privacy mechanisms do not unduly degrade the agent's utility or helpfulness to the user.AccuracyforPrivacyLensalso improves from 97.0% to 99.0% withKernel.
- Overall Contribution of
Kernel: The ablation study clearly demonstrates thatKernelis indispensable forGalaxyto achieve its stated goals of privacy preservation andself-evolution(which manifests as improved preference retention and adaptive system design).Kernel's two key roles are confirmed:- Maintaining an evolving
Cognition Forestfor long-term preference retention and personalized planning. - Enforcing privacy through the
Privacy Gate.
- Maintaining an evolving
6.2. Ablation Studies / Parameter Analysis
6.2.1. End-to-End Evaluation: Cost Analysis
Figure 5 presents a performance analysis of Galaxy in terms of latency and success rate under different model configurations.
该图像是图表,展示了Galaxy在不同模型配置下的延迟和成功分析。图(a)显示了采用不同模型组合在四种任务类型下的端到端延迟,图(b)比较了不同本地模型大小下的成功率和失败次数。
Figure 5: Latency and success analysis of Galaxy under different model configurations. (a) shows end-to-end latency of different model combinations across four task types: TOD (pure chat), STC (simple tool call), CTC (complex tool call), and SD (space design). (b) compares success rate under different local model sizes (1.5B14B) when Kernel uses Qwen2.5 for intent extraction.
-
Latency Analysis (Figure 5a):
- For simpler tasks like
TOD(pure chat) andSTC(simple tool call), latency is primarily dominated by local model inference, suggesting that even smalllocal LLMscontribute significantly to response time. - For more complex tasks such as
CTC(complex tool call) andSD(space design),cloud-based inferencebecomes the main latency contributor. - Using larger and more complex models (
14Bconfiguration) further amplifies total latency, reaching up to 6.3s for theSpace Designtask. This indicates a trade-off between model complexity/capability and response speed, especially for demanding tasks.
- For simpler tasks like
-
Success Rate Analysis (Figure 5b):
-
Despite the latency cost, larger models within
Kernel(specificallyQwen2.5-14Bfor local inference) deliver substantially better performance. -
When
KernelusesQwen2.5-14Bfor local inference, it achieves an 81.5%one-shot intent extraction success rate. This demonstrates its ability to accurately resolve complex user goals without requiring fallback interactions or clarification from the user, highlighting the benefit of a more capable local model forKernel'smetacognition.The following are the results from Table 2 of the original paper:
Execution Route Cloud API Latency (s) KoRa calls cloud API Yes 0.13 Kernel retrieves cognition No 0.87 Kernel calls space function No 0.22 KoRa feeds back result Yes 0.12 Overall 1.34
-
Table 2: Latency breakdown across different execution routes in Galaxy for a complex tool call task. Kernel is set to Qwen2.5-14B and KoRa to GPT4o-mini.
Latency Breakdown (Table 2):
For a Complex Tool Call task, with Kernel using Qwen2.5-14B and KoRa using GPT-4o-mini:
Kernel'scognition retrieval(0.87s) accounts for the largest share of the total latency (1.34s). This step is critical for selecting and grounding tool actions within theCognition Forest.KernelcallingSpacefunctions takes 0.22s.KoRacalling thecloud API(forGPT-4o-mini) and feeding back results each take relatively short times (0.13s and 0.12s, respectively). This breakdown shows thatKernel's local processing andCognition Foresttraversal are significant components of the overall task execution time, underscoring its central role in orchestrating actions.
6.2.2. Case Study: Kernel's Effectiveness
A real-world case study validates Kernel's ability to maintain system stability and perform self-recovery.
- Scenario: After cloning the project and running
main.py, the system encountered aModuleNotFoundError, failing to locate the core moduleworld_stageand preventing thecognitive architecturefrom starting. - Traditional Agent Behavior: Conventional
LLM agentframeworks would simply return the error stack, requiring manual troubleshooting by a human developer. Kernel's Action: As a self-contained minimal runtime unit,Kernelremained operational even when the main system entry failed. Leveraging its code-level understanding of the system (via theCognition Forest'sDesigndimension),Kernelidentified that theworld_stagemodule should reside in the project root. It inferred the error was due to a missingPYTHONPATHenvironment variable.Kernelthen injected the correct path, restarted execution, and successfully restored operation.- Validation: This case demonstrates
Kernel's vital role inframework-level meta-management, enabling self-checks and recovery actions that are beyond the scope of typicalLLM agents.
6.2.3. Ablation Study: Analysis Layer Modules
Figure 6 illustrates an ablation study on the Analysis Layer modules (Agenda and Persona) through a real-world interaction example of a Daily Report.
该图像是图表,展示了与KoRa的个性化每日反思和规划空间的示例。其中包含今日的反思、明日的计划以及总时间安排。
Figure 6: A real-world interaction example of Daily Report for ablation study.
- Impact of
Agenda:- Without
Agenda:KoRarelies entirely on itsmemory-stream context(short-term memory). This results in less structured plans and increased reliance on user feedback for clarification.KoRacannot proactively anticipate or structure daily activities effectively. - With
Agenda:Agendaconsolidates multi-source perceptual signals and infers a coherent behavioral profile, which serves as a structured input forKoRa's plan generation. This allowsKoRato create more structured daily plans and reduce the need for user clarification.
- Without
- Impact of
Persona:- Without
Persona: If a user repeatedly translates paper abstracts and introductions viaKoRaover several days, andKernelgenerates a dedicatedliterature translating Spacein response,KoRamight incorrectly infer that the user has discontinued translation when the user switches to the newSpace. This is becauseKoRalacks a long-term, stable understanding of user habits beyond immediate interactions. - With
Persona:Personamaintains theUser Cognition Tree, providing a comprehensive, long-term model of user characteristics. WhenPersonais available,KoRacorrectly interprets the user's continued behavior (even if it's now through a new tool) and generates the correspondingDaily Report("Today's Roast"), demonstrating a consistent understanding of user preferences.
- Without
- Validation: Both cases underscore the importance of the
Analysis Layer(AgendaandPersona) in integrating and interpreting heterogeneous information from multiple sources. These modules are essential forKoRa's proactive capabilities and for maintaining a stable, long-term understanding of user needs, preventingpersona drift.
6.3. Boundaries and Errors
The paper also acknowledges current limitations and potential issues within the Galaxy framework:
- Alignment Overfitting:
Alignment inputs(e.g., explicit user confirmations or corrections) are prioritized during cognitive construction. However, these inputs often reflect short-term characteristics or immediate needs. There is a risk thatGalaxymightoverfitto these short-term signals, failing to accurately reflect or learn long-term user habits and preferences. - Human-Dependent Space Expansion: While the
Space protocolsupports automated extensibility and generation of new interaction modules, creating complexSpaces(i.e., those requiring intricate logic or novel integrations) still necessitates multiple rounds of human guidance. Fully autonomous design and implementation of highly complexSpacesremain a challenge.
7. Conclusion & Reflections
7.1. Conclusion Summary
This work introduces Galaxy, a novel Intelligent Personal Assistant (IPA) framework centered on cognition-enhanced Large Language Model (LLM) agents. The core innovation is the Cognition Forest, a semantic structure that unifies cognitive architecture with system design, establishing a self-reinforcing loop for continuous improvement. Galaxy is explicitly designed to address three major limitations in current LLM agents: the lack of proactive skills, robust privacy preservation, and genuine self-evolution.
The framework implements two cooperative agents: KoRa, a generative agent for responsive and proactive task execution, grounded in the Cognition Forest to ensure consistency; and Kernel, a meta-cognition empowered meta-agent responsible for framework-level supervision, privacy management via Privacy Gate, and driving self-evolution.
Experimental evaluations on AgentBoard, PrefEval, and PrivacyLens benchmarks demonstrated Galaxy's superior performance compared to multiple state-of-the-art LLM agents. Ablation studies highlighted the critical contributions of Kernel and the Analysis Layer modules (Agenda, Persona) to Galaxy's capabilities. A real-world case study further validated Kernel's ability to perform self-recovery and maintain system stability. The paper concludes by emphasizing the necessity of a deeply integrated and mutually reinforcing relationship between cognitive architecture and system design for the future of LLM agents.
7.2. Limitations & Future Work
The authors identified several limitations of the current Galaxy framework:
- Alignment Overfitting:
Galaxy's cognitive construction prioritizesalignment inputs(explicit user feedback). However, these inputs may be short-term focused and could lead tooverfitting, potentially misrepresenting long-term user habits. Future work could explore methods to balance short-term alignment with long-term behavioral patterns to create more robust user models. - Human-Dependent Space Expansion: While the
Space protocolenables automated extensibility, the creation of highly complexSpacesstill requires significant human guidance and multiple rounds of interaction for full implementation. Future research could focus on enhancing the autonomy ofKernel'sUser-Adaptive System Designmechanism to generate and integrate complex functionalities with less human intervention, possibly through more sophisticatedLLM-driven code generation and testing.
7.3. Personal Insights & Critique
- Inspiration from Unification: The central idea of unifying
cognitive architecturewithsystem designthroughCognition Forestis profoundly insightful. It moves beyond the typicalLLM agentparadigm of simply providing tools or memory and instead posits a system that can understand and modify its own foundational structure. This "self-aware system design" could unlock trueself-evolutionforAIs, enabling them to adapt to entirely novel challenges rather than just improving performance within fixed constraints. This principle could be transferred to other complexAIsystems, such as autonomous driving or robotics, where theAIcould not only learn to drive better but also propose modifications to its control architecture or sensor integration based on real-world experience. - Practicality of
Kernel'sMeta-AgentRole: The implementation ofKernelas aframework-level meta-agentwith alocal code interpreterandrule enginethat can operate even when the main system fails (as shown in the case study) is a robust design choice. This ensures a high degree of resilience and self-healing, crucial forIPAsthat are expected to be available and functional continuously. This approach sets a new standard for reliability inLLM agentsystems. - Robust Privacy Mechanism: The
Privacy Gatewith its tiered masking levels, managed byKerneland aligned with theUser Cognition Tree, provides a sophisticated and context-aware approach to privacy preservation. This is a critical step towards building trust inproactive LLM agentsthat necessarily handle sensitive user data. The ability to dynamically adjust masking based on context is a significant advantage over static anonymization methods. - Addressing
Persona Drift:KoRa's integration withCognition Forestto mitigatepersona driftis an important contribution. AsLLM agentsinteract over long periods, maintaining a consistentpersonais key to user experience and trust. Grounding behavior in a hierarchical semantic structure, rather than just a linear memory stream, offers a more stable foundation forpersonaconsistency. - Potential Issues & Areas for Improvement:
- Complexity and Maintainability: While powerful, the
Cognition Forest'sSemantic,Function, andDesigndimensions, coupled with its hierarchical structure and theself-reinforcing loop, introduce significant complexity. Managing and debugging such a dynamically evolving architecture could be challenging, especially in real-world deployments. The authors could elaborate on strategies for version control, rollback mechanisms, and transparent introspection of theCognition Forest's evolution. - Scalability of
Kernel'sSelf-Evolution: TheUser-Adaptive System DesignbyKernelrelies on identifying "latent user needs" and confirming them through "lightweight alignment." How robust is this alignment process for truly novel or ambiguous needs? And how scalable is the autonomous generation and integration of newSpacesbeyond "simple" ones, given the acknowledged limitation of "human-dependentSpace Expansion"? The transition from identifying a need to reliably generating complex, bug-free code for a new system component is a massive leap. - Evaluation of Proactivity: While
AgendaandPersonaenable proactive planning, the paper's experiments primarily focus on preference retention and privacy. A more direct and quantitative evaluation ofproactive behavioreffectiveness (e.g., number of successful proactive interventions, user satisfaction with proactivity, false positive rate of proactive actions) would further strengthen the claims. - Computational Overhead: The
Cost Analysisshows thatKernel'scognition retrievalis a significant portion of latency for complex tasks. As theCognition Forestgrows withself-evolutionand user personalization, this overhead could become substantial. Future work might explore more efficient indexing, retrieval, or pruning strategies for theCognition Forest.
- Complexity and Maintainability: While powerful, the
Similar papers
Recommended via semantic vector search.