Paper status: completed

AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents

Published:10/06/2025
Original LinkPDF
Price: 0.100000
Price: 0.100000
Price: 0.100000
12 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

This paper identifies key activities and capabilities for prototyping interface agents, develops the AgentBuilder tool, and validates it via in situ studies, enabling diverse contributors to design better agent user experiences beyond AI experts.

Abstract

Interface agents powered by generative AI models (referred to as "agents") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the affordances agent prototyping systems should offer by conducting a requirements elicitation study with 12 participants with varying experience with agents. We identify key activities in agent experience prototyping and the desired capabilities of agent prototyping systems. We instantiate those capabilities in the AgentBuilder design probe for agent prototyping. We conduct an in situ agent prototyping study with 14 participants using AgentBuilder to validate the design requirements and elicit insights on how developers prototype agents and what their needs are in this process.

Mind Map

In-depth Reading

English Analysis

1. Bibliographic Information

1.1. Title

AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents

1.2. Authors

The authors of this paper are:

  • JENNY T. LIANG, Carnegie Mellon University, USA

  • TITUS BARIK, Apple, USA

  • JEFFREY NICHOLS, Apple, USA

  • ELDON SCHOOP, Apple, USA

  • RUIJIA CHENG, Apple, USA

    The authors come from both academic (Carnegie Mellon University) and industry (Apple) backgrounds, indicating a blend of theoretical research and practical application expertise in human-computer interaction (HCI), artificial intelligence (AI), and software development.

1.3. Journal/Conference

The paper is published at Ux (Conference acronym 'XX). The specific conference acronym is not provided, but the format suggests a major ACM conference. The doi:10.1145/XXXXXXX.XXXXXXX also points to an ACM publication. ACM conferences are highly reputable venues in computer science, particularly in fields like HCI, AI, and software engineering.

1.4. Publication Year

The paper was published at 2025-10-06T02:58:42.000Z, indicating it is a very recent or forthcoming publication in late 2025.

1.5. Abstract

This paper investigates the prototyping of user experiences (UX) for interface agents powered by generative AI models. The core problem addressed is the need for scaffolds that enable a broader range of individuals, beyond AI engineers, to prototype agent experiences, leveraging their diverse perspectives in design. The research begins with a requirements elicitation study involving 12 participants to identify key prototyping activities and desired system capabilities. These capabilities are then instantiated in AgentBuilder, a design probe for agent prototyping. Finally, an in situ agent prototyping study with 14 participants using AgentBuilder is conducted to validate the identified requirements and gather insights into developers' prototyping processes and needs.

The official source or PDF link is:

  • Original Source Link: https://arxiv.org/abs/2510.04452 (Preprint, version v1)

  • PDF Link: https://arxiv.org/pdf/2510.04452v2.pdf (Preprint, version v2)

    The publication status is "preprint" on arXiv, meaning it has been submitted for peer review or is awaiting formal publication.

2. Executive Summary

2.1. Background & Motivation

The paper addresses the challenge of designing user experiences (UX) for interface agents powered by generative AI models, often referred to as agents. These agents can automate actions based on user commands, significantly impacting how users interact with software and the web.

The core problem is that developing these agents typically requires specialized AI engineering expertise and knowledge of APIs, limiting agent creation to a narrow group of technical experts. This creates a significant gap: individuals with valuable user experience design insights, such as UX designers, product managers, or even end-users, are often excluded from the prototyping process due to a lack of programming knowledge. This exclusion can lead to agent experiences that are not user-centric, potentially causing negative impacts or missed opportunities in design.

The paper's entry point is to explore what affordances (i.e., opportunities for action) a prototyping system should offer to empower a broader group of "developers of agent experiences" (not just AI engineers) to design and prototype agents effectively. This aims to democratize AI design and foster more collaborative, human-centered agent development.

2.2. Main Contributions / Findings

The paper makes several primary contributions:

  1. Validated Design Requirements: It articulates a set of validated design requirements for agent prototyping systems. These are structured as five prototyping activities (A1-A5) developers engage in and six desired capabilities (C1-C6) that such systems should support. These findings are derived from a requirements elicitation study.

  2. AgentBuilder Design Probe: The paper introduces AgentBuilder, a design probe that instantiates these identified design requirements. AgentBuilder is a graphical no-code tool designed to allow individuals with varying technical backgrounds to prototype agent experiences.

  3. Insights on Agent Prototyping Process: Through an in situ user study using AgentBuilder, the paper provides insights into how developers, including non-experts, approach agent prototyping and what their specific needs are during this process. This study further validates the initial design requirements and uncovers additional nuances.

  4. Design Recommendations: Based on the study findings, the paper proposes a set of design recommendations for future agent prototyping systems.

    In essence, the paper provides a foundational understanding of what is needed to enable more inclusive and effective agent experience prototyping, offering both a conceptual framework and a practical demonstration with AgentBuilder.

3. Prerequisite Knowledge & Related Work

3.1. Foundational Concepts

To fully understand this paper, a beginner should be familiar with the following concepts:

  • Interface Agents: These are software entities that act on behalf of a user to perform tasks or provide assistance within a user interface. Historically, they have been rule-based or script-driven. In this paper, the focus is on interface agents powered by generative AI models (often called AI agents or simply agents). They can understand natural language commands and execute complex actions across various applications (e.g., web browsers, desktop apps).
  • Generative AI Models (e.g., LLMs): Generative Artificial Intelligence (AI) refers to AI systems that can create new content, such as text, images, or code, rather than just classifying or analyzing existing data. Large Language Models (LLMs) are a prominent type of generative AI that are trained on vast amounts of text data and can understand, generate, and process human language. They are the "brains" behind the interface agents discussed in this paper, enabling them to interpret user instructions semantically and decide on appropriate actions.
  • User Experience (UX): User experience (UX) refers to a person's emotions and attitudes about using a particular product, system, or service. It covers practical, experiential, affective, meaningful, and valuable aspects of human-computer interaction. In this paper, agent experience is a specific kind of UX focusing on how users interact with and perceive AI agents.
  • Prototyping: Prototyping is the process of creating preliminary versions of a product or system to test concepts, gather feedback, and iterate on design before final development. It's crucial in UX design to visualize and refine interactions. For AI agents, prototyping involves simulating how the agent would behave and interact with users and its environment.
  • Scaffolds: In the context of software development and education, scaffolds are temporary structures or tools that provide support and guidance for a task, making it easier for learners or less experienced individuals to achieve a goal. In this paper, scaffolds refer to features or design elements within a prototyping system (like AgentBuilder) that simplify the complex process of agent development for non-experts.
  • No-code/Low-code Tools: These are software development platforms that allow users to create applications with little to no coding. No-code platforms typically use visual interfaces with drag-and-drop features, while low-code platforms may require minimal coding for specific functionalities. AgentBuilder is described as a graphical no-code tool, aiming to make agent prototyping accessible.
  • Prompt Engineering: This is the process of designing and refining prompts (inputs) for generative AI models (especially LLMs) to guide them in generating desired outputs. It's an iterative and exploratory process, as the wording and structure of a prompt can significantly influence an LLM's behavior. For agents, prompt engineering is crucial for defining their capabilities, scope, and interaction logic.
  • Design Probe: A design probe is a research tool or prototype used in HCI to explore design possibilities, elicit user feedback, and uncover user needs in a real-world or simulated context. It's not necessarily a fully functional product but a vehicle for investigation. AgentBuilder serves as a design probe to study agent prototyping.
  • In Situ Study: An in situ study is a research method where observations or experiments are conducted in a natural or real-world setting, rather than in a controlled laboratory environment. This helps capture authentic behaviors and contexts. The in situ agent prototyping study used AgentBuilder in a setting that mimics actual development.
  • Wizard-of-Oz Protocol: (Though not explicitly central to AgentBuilder, it's mentioned in related work for dialogue prototyping). This is a research technique where human operators simulate the responses of an intelligent system to users, allowing researchers to test user interactions and system designs without fully building the AI. This helps to explore desired UX before investing in complex AI development.

3.2. Previous Works

The paper contextualizes its work within three main areas of related research:

  1. Interface Agents:

    • Early Interface Agents: The concept of interface agents dates back decades, with early examples focusing on specific applications like calendar management or email filtering, often leveraging application APIs. Key challenges identified then included providing users with sufficient understanding of agent actions, deciding when agents should act autonomously versus seeking user approval, and how agents should integrate into the user interface (e.g., modality, visibility).
    • LLM-Powered Agents: With the rise of (M)LLMs (Multi-modal Large Language Models), interface agents have rapidly evolved. They can now semantically understand user input and complete complex actions across diverse environments like spreadsheets (e.g., Excel), operating systems, and especially the web (termed web agents).
    • Focus of Prior Web Agent Research: Much existing research on web agents focuses on developing new generative AI architectures, benchmarks, and evaluation environments to improve agent performance.
    • Gap Addressed: The paper notes a gap: Few works, however, consider the design of collaborative human-agent workflows. It mentions prior work like Co-pilot which emphasizes user supervision by pausing before agent actions, highlighting that the UX aspects of these powerful new agents are underexplored.
  2. Designing for User Experience in Generative AI:

    • UX design for generative AI presents unique challenges due to the unpredictability and complexity of these systems. Designers must imagine and understand AI capabilities that might generate varied text, images, or other modalities.
    • Existing Generative AI UX Tools: Researchers have developed techniques and tools to help prototype generative AI UX. Examples include FauxPilot for LLM-powered UI prototyping, which uses Wizard-of-Oz for dialogue prototyping, and PromptInUser, an existing tool for UX designers working with generative AI.
    • Connection to Paper: This paper aims to combine these UX prototyping techniques with prompt development in the context of agents.
  3. Developing Prompts for Generative AI Experiences:

    • Prompts are natural language instructions embedded in AI applications. Prompt development is recognized as an iterative and exploratory process, where developers refine prompts through many interactions with the AI to decide on optimal content.
    • Challenges in Prompt Development: This task is a major challenge for developers. Tools exist to help prototype, refine, and evaluate prompts for LLMs, often using node-based approaches to build prompt pipelines.
    • Agent Prompt Development: While agent development involves prompt development, it has its own unique challenges, which are less studied. Prior work on mult-agent workflows (e.g., Zhu et al., Hao et al.) exists, but this paper aims to extend that by exploring how non-experts design prompts for agent experiences and how no-code scaffolds can support these needs.

3.3. Technological Evolution

The evolution of interface agents can be seen as:

  1. Early, Rule-Based Agents (1990s-early 2000s): These agents were typically programmed with explicit rules and scripts, often for specific, narrow tasks like email filtering or calendar management. They relied on application APIs and had limited understanding beyond their predefined functions. UX design focused on clear communication of actions and user control.

  2. Increased Sophistication (2000s-2010s): Agents became more integrated into operating systems and could perform more complex tasks, sometimes incorporating basic machine learning for user preferences. However, their capabilities were still largely constrained by explicit programming.

  3. LLM-Powered Generative AI Agents (2020s-Present): The advent of large language models (LLMs) and generative AI has revolutionized agents. These modern agents can understand natural language input semantically, reason about tasks, and perform multi-step actions across diverse digital environments (web, desktop). This shift brings immense power but also introduces unpredictability, complexity, and ethical considerations for UX design. This paper operates squarely within this latest era.

    This paper's work fits into the current technological timeline by addressing the critical need for accessible UX prototyping tools specifically for these new, powerful LLM-powered agents, bridging the gap between AI engineering and human-centered design.

3.4. Differentiation Analysis

Compared to main methods in related work, the core differences and innovations of this paper's approach are:

  • Focus on UX Prototyping for LLM Agents: While prior work on web agents largely focuses on AI performance and benchmarking, this paper explicitly prioritizes the user experience (or agent experience) aspect. It moves beyond just making agents perform better to making them designable and understandable for users.
  • Empowering Non-AI Experts: A significant innovation is the explicit goal of providing scaffolds for a broader set of individuals (UX designers, product managers, end-users) to prototype agents, not just AI engineers. This contrasts with traditional agent development, which is typically highly technical and API-driven.
  • No-Code Approach for Agents: While no-code tools exist for general generative AI UX (like PromptInUser), AgentBuilder specifically applies this paradigm to interface agents and their multi-step workflows and environmental interactions. It aims to make the complex logic of agentic workflows visually representable and editable without code.
  • Integration of Workflow and Prompt Engineering: The paper's approach, instantiated in AgentBuilder, highlights the bidirectional relationship between a graphical workflow (defining agent actions and interactions) and natural language prompts (guiding the LLM's behavior). This integrated approach is an innovation for agent prototyping, providing structured scaffolds for prompt development that go beyond simple text editing.
  • Comprehensive Prototyping Lifecycle Support: AgentBuilder not only allows for design but also execution, monitoring, and debugging of agent prototypes within a realistic environment (web browser extension). This full-lifecycle support within a no-code framework is a key differentiator, enabling iterative design and testing by non-experts.

4. Methodology

4.1. Principles

The core idea of the methodology is to understand and facilitate the prototyping of user experiences for interface agents. This is achieved through a multi-stage human-centered design approach:

  1. Requirements Elicitation: Understand the needs and activities of developers (both experts and non-experts) when prototyping agents. This involves directly asking users about their challenges and desired system capabilities.

  2. Design Probe Instantiation: Based on the elicited requirements, create a tangible design probe (a prototype system) that embodies these capabilities. This probe serves as a tool for further investigation, not necessarily a final product.

  3. In Situ Validation and Elicitation: Conduct a user study with the design probe in a realistic setting. This allows for validation of the initial requirements, observation of actual prototyping behaviors, and elicitation of deeper insights and tooling needs that might not have emerged from initial interviews alone.

    The theoretical basis behind this approach is that human-centered design and participatory design principles are crucial for AI systems, especially interface agents where UX is paramount. By involving a diverse group of stakeholders (including non-AI experts) early in the design process, the resulting systems can be more usable, understandable, and beneficial. The use of a design probe allows for concrete interaction and feedback, moving beyond abstract discussions to practical engagement.

4.2. Core Methodology In-depth (Layer by Layer)

The paper employs a mixed-methods approach involving two main studies and the development of a design probe.

4.2.1. Requirements Elicitation Study

This study aimed to understand what activities developers engage in during agent prototyping (RQ1) and what capabilities a system should afford to support this (RQ2).

4.2.1.1. Methodology (Requirements Elicitation)

  • Participants: 12 participants (N=12N=12) from a large technology company were recruited. They had varying levels of experience with agents, including:
    • Relevant Domain Expertise (N=6N=6): 2 machine learning engineers, 1 software engineer, 2 quality assurance engineers, and 1 product designer.
    • Current End-Users (N=5N=5): Individuals who use generative AI tools.
    • No Familiarity (N=1N=1): An individual with no prior exposure to agents.
  • Procedure:
    • Experts: Interviewed about their agent development experience, focusing on challenges and desired tools.
    • Non-Experts: First provided a background on agents through a side deck showing examples of agent experience patterns (e.g., web agents interacting with web pages). Then, they were asked to describe their ideal agent UX for various scenarios.
    • Sessions were conducted remotely via video conferencing, lasting 60-90 minutes, recorded and transcribed. Participants were compensated.
  • Analysis: Transcripts were analyzed using thematic analysis (specifically constant comparative method and provisional coding). Codes were reviewed for code saturation (achieved after 10 interviews). Activities were organized into higher-order phases using theoretical coding. The codes, activities, and phases were validated by another researcher.

4.2.1.2. Results (Requirements Elicitation)

The study identified two main phases in the agent prototyping process: Design the Agent and Inspect the Agent. Within these phases, five Activities (labeled A) and six Desired Capabilities (labeled C) for supporting systems were identified.

Phase 1: Design the Agent This phase involves conceptualizing and configuring the agent's behavior and interface.

  • A1. Design the scope and boundaries of the agent: Developers define what tasks the agent can perform and its operational limits. This includes selecting tools (e.g., "access right to [the] calendar") and constraining the agent's actions (e.g., in messaging apps, limiting access to hidden chats).
    • C1. Use no-code interfaces: GUI-based prototyping tools can lower the barrier for non-experts to define agent scope and behavior, similar to no-code tools for LLM applications.
    • C2. Help constrain the agent's task space and knowledge of the user: Prototyping tools should allow developers to easily define the tasks and knowledge (e.g., user preferences) the agent has access to.
  • A2. Design the agent's information display: Developers decide how the agent presents information to the user (e.g., its activities, UI elements). This involves considering what information to show and how it's shown to maintain user trust and understanding.
    • C3. Define the UI of the agent in the chat and the environment: Users should be able to specify the agent's visual representation, what information it displays (e.g., its actions, reasoning), and how it solicits user input (e.g., "How should I ask them...?").
  • A3. Design the interactions between the agent and user: This involves defining how the user and agent communicate during a task, including collaborative workflows and error handling (e.g., asking for input if an issue arises).
    • C4. Provide components that enable the agent to invoke different user interactions: Prototyping systems should offer predefined interaction components (e.g., asking for confirmation, clarifying input) that developers can incorporate into the agent's workflow, specifying when and how they are invoked.

Phase 2: Inspect the Agent This phase focuses on testing, monitoring, and debugging the agent's runtime behavior.

  • A4. Run the agent prototype: Developers need an environment to execute their agent prototype to observe its behavior and validate the agent experience. They want to see if the agent reaches its goal or if errors occur. They also need global controls (e.g., pause, resume, cancel) for agent execution.
    • C5. Provide an environment to run and control the agent: A system should offer an interactive environment to execute prototypes and allow users to steer or intervene in the agent's actions, similar to global controls found in generative AI systems.
  • A5. Understand the agent's runtime behavior: During execution, developers need to understand why the agent is behaving in a certain way. This includes inspecting visual representations (e.g., screenshots, UI element highlights) and textual representations (e.g., HTML accessibility tree, LLM prompts and outputs).
    • C6. Help the user debug the agent's runtime behavior: The system should provide tools to inspect the agent's internal state, reasoning, and tool calls, helping developers identify and fix issues (a known challenge in prompt prototyping).

4.2.2. Design Probe: AgentBuilder

AgentBuilder is a design probe developed to instantiate the Desired Capabilities identified in the requirements elicitation study. It consists of two main interfaces: a Prototype interface (for design) and an Execution interface (for running and inspecting).

4.2.2.1. Scaffolds for Designing the Agent (Prototype Interface)

AgentBuilder provides no-code scaffolds to help users design agent experiences.

  • Workflow Tab (Graphical Editor): This is a visual no-code interface (Figure 2, 3) where users design agent actions and behaviors using nodes and edges.

    • Nodes: Represent actions the agent can take (e.g., Start, End, UI Actions) or user interactions (e.g., Interact, Plan, Message, Confirmation). These directly support C4 by providing predefined interaction components.
    • Edges: Define conditions under which actions are taken, helping to constrain the agent's task space (C2).
    • Inspector: A panel to customize selected nodes (e.g., adding a "Risk" option to a Confirmation node).
    • Node Types:
      • Start: Beginning of the workflow.
      • End: End of the workflow.
      • UI Actions: Agent performs UI actions (click, type, visit webpage). Users can configure what information about these actions is displayed (C3).
      • Plan: Agent shows a high-level list of steps to the user (Figure 2-E). This is an interaction component (C4).
      • Message: Agent displays text to communicate information (Figure 2-F). An interaction component (C4).
      • Interact: Agent presents an open-ended inquiry to the user (Figure 2-A, 2-F). An interaction component (C4).
      • Confirmation: Agent presents a yes/no inquiry (Figure 2-C). An interaction component (C4).
    • Defining UI Actions and Information Display: Developers can edit how the agent's UI actions are observed by the user and what information about the actions is displayed (C3). This includes UI action previews directly on the webpage (Figure 4-1).
    • Specifying User Input: The Interact node allows users to specify how the agent solicits input (e.g., open text field, list of options) (C3). Changes are shown in a live preview.
  • Prompt Interface: This interface (Figure 3) allows developers to write structured natural language prompts, a no-code notation for generative AI prototyping. It facilitates bidirectional edits between the graphical workflow and the prompts.

    • Workflow Prompt: Synthesized from the workflow graph. Developers can manually edit it, and changes can be propagated back to the workflow graph. This provides a textual representation of the agent's high-level goal and detailed behavior.
    • Agent Capabilities Prompt (Figure 3-B): Defines guidelines for the agent's general capabilities.
    • User Information Prompt (Figure 3-C): Describes user-specific information the agent should know.
    • These prompts allow constraining the agent's task space and knowledge (C2) through natural language.
    • A Preview tab allows users to see the assembled system prompt.

4.2.2.2. Scaffolds for Inspecting the Agent (Execution Interface)

AgentBuilder provides tools for running and debugging agent prototypes.

  • Agent Execution Interface: This is an extension to the user's web browser (CowPilot Chrome extension) (Figure 4) where the agent performs UI actions and chats with the user.
    • Running the Agent: Users can initiate the agent by providing a task (Figure 4-A). The agent's actions are determined by the workflow defined in the Prototype interface.
    • Intervening in Agent Actions: AgentBuilder offers local agent controls (C5):
      • Pause button: Stops agent execution.
      • Play button: Continues execution from the current webpage state.
      • Cancel button: Ends the agent's run.
  • Debugging Mode: Activated by a DEBUG button, this mode (Figure 5) exposes additional agent information during runtime (C6).
    • Tool calls and agent reasoning corresponding to each message in the chat (Figure 5-2).
    • Accessibility tree and text inputs of the webpage (Figure 5-4).
    • A slider (Figure 5-3) allows users to navigate through the agent's previous actions.
    • This helps users understand why the agent made certain decisions, addressing C6.

4.2.2.3. Implementation

  • AgentBuilder is implemented as two TypeScript React applications.
  • The Prototype interface is a web application.
  • The Execution interface is a modified version of the CowPilot Chrome browser extension (which was originally for LLM-powered UI automation).
  • Modifications to CowPilot included adding natural language descriptions for UI actions, making UI action previews configurable, and integrating debugging agent outputs. The Interaction Components were implemented as tool prompts.
  • Workflows are stored as JSON files, enabling bidirectional translation between the graphical workflow and the natural language prompt.

4.2.3. In Situ Agent Prototyping Study

This study aimed to understand how developers prototype agents (RQ3) and what tooling needs they have (RQ4) using AgentBuilder.

4.2.3.1. Methodology (In Situ Study)

  • Participants: 14 participants (N=14N=14) with diverse backgrounds (software engineers, designers, product managers, ML engineers, etc.) and varying GenAI usage, prompt engineering experience, and agent familiarity (Table 1).
  • Procedure:
    • Participants completed an agent prototyping exercise using AgentBuilder and were encouraged to think aloud.
    • Sessions were conducted remotely, lasting approximately 90 minutes, recorded and transcribed.
    • Task: Design an agent user experience to automate ordering coffee/pastries from the Starbucks website, alleviating UX limitations of a baseline agent. Participants had to design a system prompt for the agent, define its actions, and ensure a positive user experience, considering various user inputs (e.g., "Order me a coffee. I'm not sure what I want.").
    • The AGENT CAPABILITIES PROMPT and USER INFORMATION PROMPT were pre-populated for the exercise.
    • Participants remotely controlled the researcher's screen to interact with AgentBuilder.
  • Analysis: Transcripts and recordings were analyzed using thematic analysis. Prototyping activities from the requirements elicitation study were used for post-coding. Provisional coding and open coding were used to identify tooling needs (RQ4), using the Desired Capabilities as a framework.

4.2.3.2. Results (In Situ Study)

AgentBuilder enabled all participants, regardless of prior experience, to prototype agents. The study validated the Activities and Capabilities and provided deeper insights.

  • Four-Step Prototyping Process (RQ3):

    1. Step 1: Designing an initial prototype of the agent: Participants defined a high-level goal, selected interaction components (from the Workflow or by writing in prompts), and configured information display. They anticipated what information to show and how.
    2. Step 2: Running and monitoring the agent: Participants ran the prototype (A4) to test specific user queries and observe how the agent handled different scenarios (e.g., clarifying ambiguous requests). They monitored agent messages, UI actions, and sometimes used DEBUG Mode (A6) to see agent reasoning live.
    3. Step 3: Debugging the agent: Participants focused on defects when the agent encountered issues (A5). They used the DEBUG Mode to understand tool calls and agent reasoning. Crucially, UI action previews on the webpage helped identify where the agent was clicking or interacting (C6).
    4. Step 4: Iterating on the agent: After identifying issues, participants made changes to the workflow or prompts to improve agent behavior (e.g., adding clarification questions, modifying constraints). They simplified workflows or updated prompts with more specific instructions.
  • Tooling Needs (RQ4) related to Capabilities:

    • C1. Use no-code interfaces: All participants used AgentBuilder's no-code interfaces. Some preferred the Workflow graph, others relied more on direct natural language prompt modification. The workflow notation helped structure thinking (P14: "I now see the idea of the flow"). However, some felt limited by the fixed options, desiring more understanding of what could be modified (P8: "They both feel dumbed down... I only know these are the only options available.").
    • C2. Help constrain the agent's task space and knowledge of the user: Even with pre-populated prompts, participants actively thought about the agent's scope and user knowledge. Observing agent behavior helped them realize potential issues (e.g., automatically using a user's password) and reflect on sensitive user information.
    • C3. Define the UI of the agent in the chat and the environment: Most participants modified the information display of the agent. This was guided by both end-user expectations and their own developer needs for post-hoc understanding of agent actions. They desired mechanisms to understand the agent's past, present, and anticipated behavior more comprehensively.
    • C4. Provide components that enable the agent to invoke different user interactions: Participants found predefined interaction components helpful for structuring their design ("helped me come up with the phrases... in a much more concise and clear way"). They were able to use these finite options effectively. However, they desired mechanisms to reliably specify, preview, and test these interaction components across different scenarios.
    • C5. Provide an environment to run and control the agent: All participants ran their agents. Most used Pause and Play features. A common challenge was losing track of when the agent was performing actions or awaiting input, especially when the webpage was loading, or the agent was working autonomously. They desired interfaces that support tracking detailed runtime information and errors from multiple sources in a digestible format.
    • C6. Help user debug the agent's runtime behavior: A majority used DEBUG Mode. UI action previews were very helpful for identifying what the agent was doing visually. However, participants found agent reasoning outputs sometimes cryptic, needing more support to make sense of and adjust how the agent makes decisions.

4.2.4. Implementation (Recap)

AgentBuilder is built using TypeScript React applications. The Prototype interface is a web app, and the Execution interface is a modified CowPilot Chrome extension. Key modifications for AgentBuilder included adding natural language descriptions for UI actions, configurable UI action previews, debugging outputs, and tool prompts for interaction components. Workflows are stored as JSON and support bidirectional edits between the graphical workflow and the natural language prompt.

5. Experimental Setup

5.1. Datasets

This paper primarily uses qualitative data derived from user studies rather than traditional machine learning datasets. The "data" consists of participant interviews, observations of their prototyping process, and their think-aloud protocols.

  • Requirements Elicitation Study Participants:

    • Source: 12 participants from a large technology company.
    • Characteristics: Varied experience with agents (from AI engineers to non-familiar individuals). This diversity was chosen to capture a broad range of perspectives and needs.
    • Domain: Technology professionals involved in or affected by AI agent development and usage.
  • In Situ Agent Prototyping Study Participants:

    • Source: 14 participants (distinct from the first study) from a large technology company.

    • Characteristics: Diverse job roles (Software Engineer, Designer, ML Engineer, Product Manager, etc.), varying GenAI Usage (weekly to multiple times daily), Prompt Engineering experience (no experience to extensive), and Agent Familiarity (never used to seen demos to used an agent). The table below provides details:

      The following are the results from [Table 1] of the original paper:

      PID Job Role GenAI Usage Prompt Engineering Agent Familiarity
      P1 Software Engineer More than once daily No experience Never used or seen demos of an agent
      P2 Software Engineer More than once daily Some experience Never used or seen demos of an agent
      P3 Designer Once daily A little experience Never used or seen demos of an agent
      P4 Machine Learning Engineer Once daily Extensive experience Used an agent as an end-user
      P5 GenAI Software Engineer Once daily Extensive experience Seen demos of an agent
      P6 Product Manager More than once daily Some experience Seen demos of an agent
      P7 Web Software Engineer More than once daily Extensive experience Seen demos of an agent
      P8 Human Factors Engineer More than once daily No experience Never used or seen demos of an agent
      P9 Technical Project Manager Once daily Some experience Seen demos of an agent
      P10 Engineering Program Manager More than once daily A little experience Seen demos of an agent
      P11 Modem Systems Test Engineer Weekly A little experience Seen demos of an agent
      P12 Machine Learning Engineer More than once daily Extensive experience Seen demos of an agent
      P13 Acoustics Experience Engineer Once daily Some experience Used an agent as an end-user
      P14 Product Manager More than once daily Substantial experience Seen demos of an agent
  • Task: For the in situ study, participants were given a task to design an agent user experience for ordering coffee/pastries from the Starbucks website. Example user inputs included:

    • "Order me a coffee. I'm not sure what I want."
    • "Order me a tall Caffè Misto."
    • "Order me a tall iced chai latte and a butter croissant."
    • "Get me a croissant." (to order a chocolate croissant)
    • "I'm not sure what I want to order. Give me a couple of ideas based on what I've ordered in the past." The agent was assumed to start from the Starbucks homepage and had access to user background information (pickup location, password, previous orders).
  • Why these datasets were chosen: The diverse participant groups were chosen to ensure that the elicited requirements and observed behaviors were representative of the broad set of individuals the paper aims to empower. The Starbucks ordering task was a concrete, real-world scenario that allowed for various levels of complexity and interaction, making it suitable for agent prototyping and UX exploration.

5.2. Evaluation Metrics

The paper does not use quantitative evaluation metrics in the traditional sense, as it is a qualitative study focused on design requirements, user behavior, and tooling needs. The "evaluation" is primarily conducted through:

  • Validation of Design Requirements: The in situ study with AgentBuilder served to validate whether the Activities and Desired Capabilities identified in the requirements elicitation study were indeed relevant and useful in a practical prototyping scenario.
  • Elicitation of Insights: The think-aloud protocols and observations during the in situ study were analyzed to uncover deeper insights into the agent prototyping process, challenges faced by developers, and specific tooling needs.
  • Assessment of Affordances: The effectiveness of AgentBuilder as a design probe was assessed by its ability to allow diverse participants to successfully create agent prototypes and by the quality of the feedback and insights it generated regarding the design requirements.

5.3. Baselines

The paper does not compare AgentBuilder against specific baseline prototyping models or algorithms. Instead, the implicit baseline is the current state of agent development, which typically requires:

  • AI engineering expertise: Deep knowledge of generative AI models, APIs, and programming languages.

  • Specialized tools: Often code-based IDEs or prompt engineering platforms that are not necessarily designed for UX-centric prototyping by non-programmers.

    The paper aims to show that AgentBuilder provides a more accessible and user-centered alternative to these traditional, expert-centric approaches, thereby broadening participation in agent design.

6. Results & Analysis

6.1. Core Results Analysis

The in situ agent prototyping study with AgentBuilder confirmed the validity of the previously identified design requirements (five prototyping activities and six capabilities) and provided rich insights into the agent prototyping process. The most significant finding is that AgentBuilder successfully enabled participants with diverse backgrounds and varying levels of expertise to prototype agent experiences, broadening participation in AI design.

The study validated the four-step prototyping process:

  1. Designing an initial prototype: Participants started by defining the agent's goal and selecting interaction components or writing prompts. This highlighted C1 (no-code interfaces) as beneficial for structuring their ideas.

  2. Running and monitoring the agent: Participants actively ran their prototypes to observe agent behavior and validate the UX. This reinforced the need for C5 (environment to run and control) but also exposed challenges in tracking agent activity across different UI elements.

  3. Debugging the agent: When defects occurred, participants utilized AgentBuilder's DEBUG Mode, particularly UI action previews, which directly supported C6 (debugging runtime behavior). However, the cryptic nature of agent reasoning indicated areas for improvement.

  4. Iterating on the agent: Based on debugging and observations, participants refined their workflows and prompts, demonstrating the iterative nature of agent design.

    Key advantages of AgentBuilder and the no-code approach observed:

  • Broadened Participation: AgentBuilder's no-code graphical interfaces (C1) allowed individuals without AI engineering or programming expertise to engage in agent prototyping. This was particularly effective for product managers and designers, enabling them to reason about agent flows and interactions.

  • Structured Prompting: The workflow notation in AgentBuilder acted as a scaffold for prompt engineering. Participants could translate graphical workflow components into natural language instructions, aiding in the iterative and exploratory nature of prompt development. This is a significant innovation compared to unstructured prompt engineering.

  • Understanding Agent Scope and Knowledge (C2): By having to explicitly define agent capabilities and user information in prompts, participants gained a better understanding of how LLM agents perceive and use information. Observing agent behavior in the in situ study revealed sensitivities around user data (e.g., passwords), prompting reflection on ethical design.

  • Customizable Information Display (C3): Participants actively customized what information the agent displayed (e.g., UI action previews, agent reasoning). This customization was driven by both end-user expectations and the developers' own needs for understanding agent behavior.

  • Facilitated Interaction Design (C4): The predefined interaction components (e.g., Plan, Interact, Confirm) provided a manageable set of options that helped participants structure agent-user dialogues concisely.

  • Interactive Debugging (C6): The DEBUG Mode with UI action previews was highly valued for visualizing the agent's actions on the webpage. This made it easier to pinpoint where an agent was failing, which is crucial for agent development.

    However, the study also revealed several challenges and areas for improvement:

  • Lack of Agency/Clarity in No-Code: While no-code was empowering, some participants felt a lack of agency or understanding of what else could be modified beyond the provided scaffolds (P8's quote about "dumbed down" interfaces).

  • Understanding Agent's Mental Model (C6): While DEBUG Mode helped, the agent reasoning outputs were often cryptic, making it difficult for developers to truly understand why the LLM made certain decisions and how to effectively adjust its behavior.

  • Tracking Agent's Runtime State (C5): Participants often lost track of the agent's current state, especially during autonomous actions or webpage loading. They desired more digestible and integrated runtime information across the chat and web interface.

  • Reliable Testing of Interactions (C4): Participants expressed a need for better ways to reliably specify, preview, and test how interaction components would behave in different scenarios.

6.2. Data Presentation (Tables)

The main paper did not contain additional tables beyond the participant table already transcribed above.

6.3. Ablation Studies / Parameter Analysis

The paper, being a qualitative HCI study focused on design requirements and user experience, does not include traditional ablation studies or parameter analysis as would be found in a quantitative machine learning paper. The AgentBuilder itself is a design probe and not a model whose components are being quantitatively evaluated for performance. The validation of its components' effectiveness is implicitly done through the qualitative user feedback and observation during the in situ study.

7. Conclusion & Reflections

7.1. Conclusion Summary

This work thoroughly explored scaffolds for prototyping user experiences of interface agents. It began by identifying key design requirements through a requirements elicitation study with diverse participants, articulating them as five prototyping activities and six desired capabilities for agent prototyping systems. These requirements were then instantiated in AgentBuilder, a no-code design probe that facilitates the design, development, and execution of agent prototypes. An in situ user study using AgentBuilder validated these design requirements and provided crucial insights into how developers prototype agents and their specific needs, particularly regarding debugging, runtime visibility, and interaction component design. The study successfully demonstrated that no-code interfaces can significantly broaden participation in agent design.

7.2. Limitations & Future Work

The authors acknowledge several limitations:

  • Generalizability of Studies: Both studies were conducted with participants from a single, large technology company. This limits the generalizability of the findings to a broader population of agent developers and development contexts. Future work could involve more diverse participant pools.

  • Sample Size: The sample sizes (N=12N=12 and N=14N=14) are typical for qualitative HCI studies but might not capture the full spectrum of behaviors or needs across a very large population.

  • Scope of Agents: The design probe (AgentBuilder) and the study task (Starbucks web ordering) focused on web agents performing UI actions. The findings might not fully apply to agents operating in other environments or performing different types of tasks (e.g., desktop automation, conversational-only agents).

  • Fidelity of Design Probe: AgentBuilder is a design probe, meaning it's a tool for research and exploration, not a fully-fledged product. While effective for eliciting insights, its capabilities might not be as extensive or robust as a production-ready agent development platform.

    The authors suggest future work based on their findings, particularly focusing on:

  • Developing mechanisms that help developers make sense of the agent's actual implementation rather than just its external behavior. This includes improving debugging tools and making agent reasoning more transparent and actionable.

  • Exploring ways to make interaction components more customizable while retaining the simplicity of no-code.

  • Designing better runtime monitoring interfaces that integrate information from various sources (chat, webpage, internal state) into a digestible format to help developers track agent activity effectively.

7.3. Personal Insights & Critique

This paper makes a significant contribution by foregrounding the user experience of interface agents and by democratizing their prototyping. The explicit focus on scaffolds for non-AI experts is crucial for fostering human-centered AI design.

Inspirations:

  • Democratization of AI: The no-code approach of AgentBuilder is highly inspiring. It demonstrates a clear path to involve UX designers, product managers, and even end-users in the AI development lifecycle, which is vital for creating AI systems that are truly useful, ethical, and aligned with human values. This aligns with the broader movement towards democratizing AI.
  • Integration of Design and Prompt Engineering: The bidirectional link between the graphical workflow and natural language prompts is an elegant solution. It provides structure to prompt engineering, which can often be an unstructured and opaque process. This could be a powerful paradigm for other generative AI applications.
  • Focus on Debuggability for UX: The emphasis on runtime monitoring and debugging not just for AI performance but for UX understanding is a critical insight. Understanding why an agent behaves a certain way (even if it's "correct" from a technical standpoint) is essential for UX designers to refine the agent experience.

Potential Issues/Areas for Improvement:

  • Scalability for Complex Agents: While effective for the Starbucks ordering task, the no-code workflow might become unwieldy for extremely complex agents with many states, conditions, and integrations. Future research could explore hierarchical workflow design or modularity to manage this complexity.

  • Transparency of Agent Reasoning: The critique that agent reasoning was cryptic is a common challenge with LLMs. While AgentBuilder exposes it, making it interpretable and actionable for UX designers (who may not have deep ML knowledge) remains a significant open problem for XAI (Explainable AI). This paper highlights the need for XAI to be integrated into prototyping tools.

  • Verification and Testing of Interaction Components: Participants desired reliable ways to specify, preview, and test interaction components. This points to a need for more robust testing frameworks within no-code environments that can simulate various user inputs and edge cases to ensure desired agent behaviors.

  • Role of LLM Choice: The paper doesn't deeply delve into how the choice of underlying LLM might impact the prototyping process or the agent's behavior. Different LLMs might respond differently to prompts, affecting debuggability and requiring LLM-specific scaffolding.

    Overall, this paper provides a robust foundation and a compelling design probe for advancing the field of human-AI interaction, particularly in making interface agent design more accessible and human-centered. Its insights are highly transferable to any domain involving LLM-powered agents where user experience is a primary concern.

Similar papers

Recommended via semantic vector search.

No similar papers found yet.