AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
TL;DR Summary
This paper identifies key activities and capabilities for prototyping interface agents, develops the AgentBuilder tool, and validates it via in situ studies, enabling diverse contributors to design better agent user experiences beyond AI experts.
Abstract
Interface agents powered by generative AI models (referred to as "agents") can automate actions based on user commands. An important aspect of developing agents is their user experience (i.e., agent experience). There is a growing need to provide scaffolds for a broader set of individuals beyond AI engineers to prototype agent experiences, since they can contribute valuable perspectives to designing agent experiences. In this work, we explore the affordances agent prototyping systems should offer by conducting a requirements elicitation study with 12 participants with varying experience with agents. We identify key activities in agent experience prototyping and the desired capabilities of agent prototyping systems. We instantiate those capabilities in the AgentBuilder design probe for agent prototyping. We conduct an in situ agent prototyping study with 14 participants using AgentBuilder to validate the design requirements and elicit insights on how developers prototype agents and what their needs are in this process.
Mind Map
In-depth Reading
English Analysis
1. Bibliographic Information
1.1. Title
AgentBuilder: Exploring Scaffolds for Prototyping User Experiences of Interface Agents
1.2. Authors
The authors of this paper are:
-
JENNY T. LIANG, Carnegie Mellon University, USA
-
TITUS BARIK, Apple, USA
-
JEFFREY NICHOLS, Apple, USA
-
ELDON SCHOOP, Apple, USA
-
RUIJIA CHENG, Apple, USA
The authors come from both academic (Carnegie Mellon University) and industry (Apple) backgrounds, indicating a blend of theoretical research and practical application expertise in human-computer interaction (HCI), artificial intelligence (AI), and software development.
1.3. Journal/Conference
The paper is published at Ux (Conference acronym 'XX). The specific conference acronym is not provided, but the format suggests a major ACM conference. The doi:10.1145/XXXXXXX.XXXXXXX also points to an ACM publication. ACM conferences are highly reputable venues in computer science, particularly in fields like HCI, AI, and software engineering.
1.4. Publication Year
The paper was published at 2025-10-06T02:58:42.000Z, indicating it is a very recent or forthcoming publication in late 2025.
1.5. Abstract
This paper investigates the prototyping of user experiences (UX) for interface agents powered by generative AI models. The core problem addressed is the need for scaffolds that enable a broader range of individuals, beyond AI engineers, to prototype agent experiences, leveraging their diverse perspectives in design. The research begins with a requirements elicitation study involving 12 participants to identify key prototyping activities and desired system capabilities. These capabilities are then instantiated in AgentBuilder, a design probe for agent prototyping. Finally, an in situ agent prototyping study with 14 participants using AgentBuilder is conducted to validate the identified requirements and gather insights into developers' prototyping processes and needs.
1.6. Original Source Link
The official source or PDF link is:
-
Original Source Link:
https://arxiv.org/abs/2510.04452(Preprint, version v1) -
PDF Link:
https://arxiv.org/pdf/2510.04452v2.pdf(Preprint, version v2)The publication status is "preprint" on arXiv, meaning it has been submitted for peer review or is awaiting formal publication.
2. Executive Summary
2.1. Background & Motivation
The paper addresses the challenge of designing user experiences (UX) for interface agents powered by generative AI models, often referred to as agents. These agents can automate actions based on user commands, significantly impacting how users interact with software and the web.
The core problem is that developing these agents typically requires specialized AI engineering expertise and knowledge of APIs, limiting agent creation to a narrow group of technical experts. This creates a significant gap: individuals with valuable user experience design insights, such as UX designers, product managers, or even end-users, are often excluded from the prototyping process due to a lack of programming knowledge. This exclusion can lead to agent experiences that are not user-centric, potentially causing negative impacts or missed opportunities in design.
The paper's entry point is to explore what affordances (i.e., opportunities for action) a prototyping system should offer to empower a broader group of "developers of agent experiences" (not just AI engineers) to design and prototype agents effectively. This aims to democratize AI design and foster more collaborative, human-centered agent development.
2.2. Main Contributions / Findings
The paper makes several primary contributions:
-
Validated Design Requirements: It articulates a set of validated
design requirementsforagent prototyping systems. These are structured as fiveprototyping activities(A1-A5) developers engage in and sixdesired capabilities(C1-C6) that such systems should support. These findings are derived from arequirements elicitation study. -
AgentBuilder Design Probe: The paper introduces
AgentBuilder, adesign probethat instantiates these identifieddesign requirements.AgentBuilderis agraphical no-code tooldesigned to allow individuals with varying technical backgrounds to prototypeagent experiences. -
Insights on Agent Prototyping Process: Through an
in situ user studyusingAgentBuilder, the paper provides insights into how developers, including non-experts, approachagent prototypingand what their specific needs are during this process. This study further validates the initialdesign requirementsand uncovers additional nuances. -
Design Recommendations: Based on the study findings, the paper proposes a set of
design recommendationsfor futureagent prototyping systems.In essence, the paper provides a foundational understanding of what is needed to enable more inclusive and effective
agent experience prototyping, offering both a conceptual framework and a practical demonstration withAgentBuilder.
3. Prerequisite Knowledge & Related Work
3.1. Foundational Concepts
To fully understand this paper, a beginner should be familiar with the following concepts:
- Interface Agents: These are software entities that act on behalf of a user to perform tasks or provide assistance within a user interface. Historically, they have been rule-based or script-driven. In this paper, the focus is on
interface agentspowered bygenerative AI models(often calledAI agentsor simplyagents). They can understand natural language commands and execute complex actions across various applications (e.g., web browsers, desktop apps). - Generative AI Models (e.g., LLMs):
Generative Artificial Intelligence (AI)refers to AI systems that can create new content, such as text, images, or code, rather than just classifying or analyzing existing data.Large Language Models (LLMs)are a prominent type ofgenerative AIthat are trained on vast amounts of text data and can understand, generate, and process human language. They are the "brains" behind theinterface agentsdiscussed in this paper, enabling them to interpret user instructions semantically and decide on appropriate actions. - User Experience (UX):
User experience (UX)refers to a person's emotions and attitudes about using a particular product, system, or service. It covers practical, experiential, affective, meaningful, and valuable aspects of human-computer interaction. In this paper,agent experienceis a specific kind ofUXfocusing on how users interact with and perceiveAI agents. - Prototyping:
Prototypingis the process of creating preliminary versions of a product or system to test concepts, gather feedback, and iterate on design before final development. It's crucial inUX designto visualize and refine interactions. ForAI agents,prototypinginvolves simulating how the agent would behave and interact with users and its environment. - Scaffolds: In the context of software development and education,
scaffoldsare temporary structures or tools that provide support and guidance for a task, making it easier for learners or less experienced individuals to achieve a goal. In this paper,scaffoldsrefer to features or design elements within aprototyping system(likeAgentBuilder) that simplify the complex process ofagent developmentfor non-experts. - No-code/Low-code Tools: These are software development platforms that allow users to create applications with little to no coding.
No-codeplatforms typically use visual interfaces with drag-and-drop features, whilelow-codeplatforms may require minimal coding for specific functionalities.AgentBuilderis described as agraphical no-code tool, aiming to makeagent prototypingaccessible. - Prompt Engineering: This is the process of designing and refining
prompts(inputs) forgenerative AI models(especiallyLLMs) to guide them in generating desired outputs. It's an iterative and exploratory process, as the wording and structure of apromptcan significantly influence anLLM's behavior. Foragents,prompt engineeringis crucial for defining theircapabilities,scope, andinteraction logic. - Design Probe: A
design probeis a research tool or prototype used inHCIto explore design possibilities, elicit user feedback, and uncover user needs in a real-world or simulated context. It's not necessarily a fully functional product but a vehicle for investigation.AgentBuilderserves as adesign probeto studyagent prototyping. - In Situ Study: An
in situ studyis a research method where observations or experiments are conducted in a natural or real-world setting, rather than in a controlled laboratory environment. This helps capture authentic behaviors and contexts. Thein situ agent prototyping studyusedAgentBuilderin a setting that mimics actual development. - Wizard-of-Oz Protocol: (Though not explicitly central to
AgentBuilder, it's mentioned in related work fordialogue prototyping). This is a research technique where human operators simulate the responses of an intelligent system to users, allowing researchers to test user interactions and system designs without fully building the AI. This helps to explore desiredUXbefore investing in complexAI development.
3.2. Previous Works
The paper contextualizes its work within three main areas of related research:
-
Interface Agents:
- Early Interface Agents: The concept of
interface agentsdates back decades, with early examples focusing on specific applications like calendar management or email filtering, often leveraging applicationAPIs. Key challenges identified then included providing users with sufficient understanding of agent actions, deciding when agents should act autonomously versus seeking user approval, and how agents should integrate into the user interface (e.g., modality, visibility). - LLM-Powered Agents: With the rise of
(M)LLMs(Multi-modal Large Language Models),interface agentshave rapidly evolved. They can now semantically understand user input and complete complex actions across diverse environments like spreadsheets (e.g.,Excel), operating systems, and especially the web (termedweb agents). - Focus of Prior
Web AgentResearch: Much existing research onweb agentsfocuses on developing newgenerative AIarchitectures, benchmarks, and evaluation environments to improveagent performance. - Gap Addressed: The paper notes a gap:
Few works, however, consider the design of collaborative human-agent workflows.It mentions prior work likeCo-pilotwhich emphasizes user supervision by pausing before agent actions, highlighting that theUXaspects of these powerful new agents are underexplored.
- Early Interface Agents: The concept of
-
Designing for User Experience in Generative AI:
UX designforgenerative AIpresents unique challenges due to theunpredictabilityandcomplexityof these systems. Designers must imagine and understandAI capabilitiesthat might generate varied text, images, or other modalities.- Existing
Generative AI UXTools: Researchers have developed techniques and tools to help prototypegenerative AI UX. Examples includeFauxPilotforLLM-powered UIprototyping, which usesWizard-of-Ozfor dialogue prototyping, andPromptInUser, an existing tool forUX designersworking withgenerative AI. - Connection to Paper: This paper aims to combine these
UX prototypingtechniques withprompt developmentin the context ofagents.
-
Developing Prompts for Generative AI Experiences:
Promptsarenatural language instructionsembedded inAI applications.Prompt developmentis recognized as aniterativeandexploratoryprocess, where developers refinepromptsthrough many interactions with theAIto decide on optimal content.- Challenges in
Prompt Development: This task is a major challenge for developers. Tools exist to helpprototype,refine, andevaluate promptsforLLMs, often usingnode-basedapproaches to buildprompt pipelines. Agent PromptDevelopment: Whileagent developmentinvolvesprompt development, it has its own unique challenges, which are less studied. Prior work onmult-agent workflows(e.g.,Zhu et al.,Hao et al.) exists, but this paper aims to extend that by exploring how non-experts designpromptsforagent experiencesand howno-code scaffoldscan support these needs.
3.3. Technological Evolution
The evolution of interface agents can be seen as:
-
Early, Rule-Based Agents (1990s-early 2000s): These agents were typically programmed with explicit rules and scripts, often for specific, narrow tasks like email filtering or calendar management. They relied on application
APIsand had limited understanding beyond their predefined functions.UX designfocused on clear communication of actions and user control. -
Increased Sophistication (2000s-2010s): Agents became more integrated into operating systems and could perform more complex tasks, sometimes incorporating basic
machine learningforuser preferences. However, their capabilities were still largely constrained by explicit programming. -
LLM-Powered Generative AI Agents (2020s-Present): The advent of
large language models (LLMs)andgenerative AIhas revolutionized agents. These modern agents can understandnatural languageinputsemantically, reason about tasks, and performmulti-step actionsacross diverse digital environments (web, desktop). This shift brings immense power but also introducesunpredictability,complexity, andethical considerationsforUX design. This paper operates squarely within this latest era.This paper's work fits into the current technological timeline by addressing the critical need for accessible
UX prototyping toolsspecifically for these new, powerfulLLM-powered agents, bridging the gap betweenAI engineeringandhuman-centered design.
3.4. Differentiation Analysis
Compared to main methods in related work, the core differences and innovations of this paper's approach are:
- Focus on
UX PrototypingforLLM Agents: While prior work onweb agentslargely focuses onAI performanceandbenchmarking, this paper explicitly prioritizes theuser experience(oragent experience) aspect. It moves beyond just making agents perform better to making themdesignableandunderstandablefor users. - Empowering Non-AI Experts: A significant innovation is the explicit goal of providing
scaffoldsfor abroader set of individuals(UX designers, product managers, end-users) to prototype agents, not justAI engineers. This contrasts with traditionalagent development, which is typically highly technical andAPI-driven. No-CodeApproach for Agents: Whileno-codetools exist for generalgenerative AI UX(likePromptInUser),AgentBuilderspecifically applies this paradigm tointerface agentsand theirmulti-step workflowsandenvironmental interactions. It aims to make the complex logic ofagentic workflowsvisually representable and editable without code.- Integration of
WorkflowandPrompt Engineering: The paper's approach, instantiated inAgentBuilder, highlights thebidirectional relationshipbetween agraphical workflow(defining agent actions and interactions) andnatural language prompts(guiding theLLM's behavior). This integrated approach is an innovation foragent prototyping, providing structuredscaffoldsforprompt developmentthat go beyond simple text editing. - Comprehensive
Prototyping LifecycleSupport:AgentBuildernot only allows fordesignbut alsoexecution,monitoring, anddebuggingofagent prototypeswithin a realistic environment (web browser extension). This full-lifecycle support within ano-codeframework is a key differentiator, enabling iterativedesignandtestingby non-experts.
4. Methodology
4.1. Principles
The core idea of the methodology is to understand and facilitate the prototyping of user experiences for interface agents. This is achieved through a multi-stage human-centered design approach:
-
Requirements Elicitation: Understand the needs and activities of developers (both experts and non-experts) when
prototyping agents. This involves directly asking users about their challenges and desired systemcapabilities. -
Design Probe Instantiation: Based on the elicited requirements, create a tangible
design probe(a prototype system) that embodies thesecapabilities. Thisprobeserves as a tool for further investigation, not necessarily a final product. -
In Situ Validation and Elicitation: Conduct a user study with the
design probein a realistic setting. This allows for validation of the initial requirements, observation of actualprototyping behaviors, and elicitation of deeper insights andtooling needsthat might not have emerged from initial interviews alone.The theoretical basis behind this approach is that
human-centered designandparticipatory designprinciples are crucial forAI systems, especiallyinterface agentswhereUXis paramount. By involving a diverse group of stakeholders (including non-AI experts) early in the design process, the resulting systems can be more usable, understandable, and beneficial. The use of adesign probeallows for concrete interaction and feedback, moving beyond abstract discussions to practical engagement.
4.2. Core Methodology In-depth (Layer by Layer)
The paper employs a mixed-methods approach involving two main studies and the development of a design probe.
4.2.1. Requirements Elicitation Study
This study aimed to understand what activities developers engage in during agent prototyping (RQ1) and what capabilities a system should afford to support this (RQ2).
4.2.1.1. Methodology (Requirements Elicitation)
- Participants: 12 participants () from a large technology company were recruited. They had varying levels of experience with
agents, including:- Relevant Domain Expertise (): 2
machine learning engineers, 1software engineer, 2quality assurance engineers, and 1product designer. - Current End-Users (): Individuals who use
generative AItools. - No Familiarity (): An individual with no prior exposure to
agents.
- Relevant Domain Expertise (): 2
- Procedure:
- Experts: Interviewed about their
agent developmentexperience, focusing on challenges and desired tools. - Non-Experts: First provided a background on
agentsthrough a side deck showing examples ofagent experience patterns(e.g.,web agentsinteracting with web pages). Then, they were asked to describe their idealagent UXfor various scenarios. - Sessions were conducted remotely via video conferencing, lasting 60-90 minutes, recorded and transcribed. Participants were compensated.
- Experts: Interviewed about their
- Analysis: Transcripts were analyzed using
thematic analysis(specificallyconstant comparative methodandprovisional coding). Codes were reviewed forcode saturation(achieved after 10 interviews).Activitieswere organized into higher-order phases usingtheoretical coding. The codes, activities, and phases were validated by another researcher.
4.2.1.2. Results (Requirements Elicitation)
The study identified two main phases in the agent prototyping process: Design the Agent and Inspect the Agent. Within these phases, five Activities (labeled A) and six Desired Capabilities (labeled C) for supporting systems were identified.
Phase 1: Design the Agent This phase involves conceptualizing and configuring the agent's behavior and interface.
- A1. Design the scope and boundaries of the agent: Developers define what tasks the agent can perform and its operational limits. This includes selecting
tools(e.g., "access right to [the] calendar") andconstrainingthe agent's actions (e.g., in messaging apps, limiting access to hidden chats).- C1. Use
no-code interfaces:GUI-based prototyping toolscan lower the barrier for non-experts to define agent scope and behavior, similar tono-codetools forLLM applications. - C2. Help constrain the agent's task space and knowledge of the user: Prototyping tools should allow developers to easily define the
tasksandknowledge(e.g., user preferences) the agent has access to.
- C1. Use
- A2. Design the agent's information display: Developers decide how the agent presents information to the user (e.g., its activities,
UI elements). This involves consideringwhat information to showandhowit's shown to maintain usertrustandunderstanding.- C3. Define the
UIof the agent in the chat and the environment: Users should be able to specify the agent'svisual representation, what information it displays (e.g., its actions, reasoning), and how it solicits user input (e.g., "How should I ask them...?").
- C3. Define the
- A3. Design the interactions between the agent and user: This involves defining how the user and agent communicate during a task, including
collaborative workflowsanderror handling(e.g., asking for input if an issue arises).- C4. Provide components that enable the agent to invoke different user interactions: Prototyping systems should offer predefined
interaction components(e.g., asking for confirmation, clarifying input) that developers can incorporate into the agent's workflow, specifying when and how they are invoked.
- C4. Provide components that enable the agent to invoke different user interactions: Prototyping systems should offer predefined
Phase 2: Inspect the Agent This phase focuses on testing, monitoring, and debugging the agent's runtime behavior.
- A4. Run the agent prototype: Developers need an environment to execute their
agent prototypeto observe its behavior and validate theagent experience. They want to see if the agent reaches its goal or iferrorsoccur. They also needglobal controls(e.g.,pause,resume,cancel) for agent execution.- C5. Provide an environment to run and control the agent: A system should offer an interactive environment to execute
prototypesand allow users to steer or intervene in the agent's actions, similar toglobal controlsfound ingenerative AIsystems.
- C5. Provide an environment to run and control the agent: A system should offer an interactive environment to execute
- A5. Understand the agent's runtime behavior: During execution, developers need to understand why the agent is behaving in a certain way. This includes inspecting
visual representations(e.g., screenshots,UI elementhighlights) andtextual representations(e.g.,HTML accessibility tree,LLM promptsandoutputs).- C6. Help the user
debugthe agent's runtime behavior: The system should provide tools to inspect the agent'sinternal state,reasoning, andtool calls, helping developers identify and fix issues (a known challenge inprompt prototyping).
- C6. Help the user
4.2.2. Design Probe: AgentBuilder
AgentBuilder is a design probe developed to instantiate the Desired Capabilities identified in the requirements elicitation study. It consists of two main interfaces: a Prototype interface (for design) and an Execution interface (for running and inspecting).
4.2.2.1. Scaffolds for Designing the Agent (Prototype Interface)
AgentBuilder provides no-code scaffolds to help users design agent experiences.
-
Workflow Tab (Graphical Editor): This is a visual
no-code interface(Figure 2, 3) where users design agentactionsandbehaviorsusingnodesandedges.- Nodes: Represent
actionsthe agent can take (e.g.,Start,End,UI Actions) oruser interactions(e.g.,Interact,Plan,Message,Confirmation). These directly supportC4by providing predefinedinteraction components. - Edges: Define
conditionsunder which actions are taken, helping toconstrain the agent's task space(C2). - Inspector: A panel to customize selected
nodes(e.g., adding a "Risk" option to aConfirmationnode). - Node Types:
Start: Beginning of the workflow.End: End of the workflow.UI Actions: Agent performsUIactions (click, type, visit webpage). Users can configurewhat informationabout these actions is displayed (C3).Plan: Agent shows a high-level list of steps to the user (Figure 2-E). This is aninteraction component(C4).Message: Agent displays text to communicate information (Figure 2-F). Aninteraction component(C4).Interact: Agent presents an open-ended inquiry to the user (Figure 2-A, 2-F). Aninteraction component(C4).Confirmation: Agent presents a yes/no inquiry (Figure 2-C). Aninteraction component(C4).
- Defining
UI ActionsandInformation Display: Developers can edit how the agent'sUI actionsare observed by the user andwhat informationabout the actions is displayed (C3). This includesUI action previewsdirectly on the webpage (Figure 4-1). - Specifying User Input: The
Interactnode allows users to specify how the agent solicits input (e.g.,open text field,list of options) (C3). Changes are shown in a live preview.
- Nodes: Represent
-
Prompt Interface: This interface (
Figure 3) allows developers to writestructured natural language prompts, ano-code notationforgenerative AIprototyping. It facilitatesbidirectional editsbetween the graphicalworkflowand theprompts.- Workflow Prompt: Synthesized from the
workflow graph. Developers can manually edit it, and changes can be propagated back to theworkflow graph. This provides a textual representation of the agent's high-level goal and detailed behavior. - Agent Capabilities Prompt (Figure 3-B): Defines guidelines for the agent's general
capabilities. - User Information Prompt (Figure 3-C): Describes
user-specific informationthe agent should know. - These prompts allow
constraining the agent's task space and knowledge(C2) through natural language. - A
Previewtab allows users to see the assembledsystem prompt.
- Workflow Prompt: Synthesized from the
4.2.2.2. Scaffolds for Inspecting the Agent (Execution Interface)
AgentBuilder provides tools for running and debugging agent prototypes.
- Agent Execution Interface: This is an extension to the user's web browser (
CowPilot Chrome extension) (Figure 4) where the agent performsUI actionsand chats with the user.- Running the Agent: Users can initiate the agent by providing a
task(Figure 4-A). The agent'sactionsare determined by theworkflowdefined in thePrototypeinterface. - Intervening in Agent Actions:
AgentBuilderofferslocal agent controls(C5):Pausebutton: Stops agent execution.Playbutton: Continues execution from the current webpage state.Cancelbutton: Ends the agent's run.
- Running the Agent: Users can initiate the agent by providing a
- Debugging Mode: Activated by a
DEBUGbutton, this mode (Figure 5) exposes additionalagent informationduring runtime (C6).Tool callsandagent reasoningcorresponding to each message in the chat (Figure 5-2).Accessibility treeandtext inputsof the webpage (Figure 5-4).- A
slider(Figure 5-3) allows users to navigate through the agent'sprevious actions. - This helps users understand why the agent made certain decisions, addressing
C6.
4.2.2.3. Implementation
AgentBuilderis implemented as twoTypeScript React applications.- The
Prototypeinterface is a web application. - The
Executioninterface is a modified version of theCowPilot Chrome browser extension(which was originally forLLM-powered UI automation). - Modifications to
CowPilotincluded addingnatural language descriptionsforUI actions, makingUI action previewsconfigurable, and integratingdebugging agent outputs. TheInteraction Componentswere implemented astool prompts. - Workflows are stored as
JSONfiles, enablingbidirectional translationbetween thegraphical workflowand thenatural language prompt.
4.2.3. In Situ Agent Prototyping Study
This study aimed to understand how developers prototype agents (RQ3) and what tooling needs they have (RQ4) using AgentBuilder.
4.2.3.1. Methodology (In Situ Study)
- Participants: 14 participants () with diverse backgrounds (software engineers, designers, product managers, ML engineers, etc.) and varying
GenAI usage,prompt engineering experience, andagent familiarity(Table 1). - Procedure:
- Participants completed an
agent prototyping exerciseusingAgentBuilderand were encouraged tothink aloud. - Sessions were conducted remotely, lasting approximately 90 minutes, recorded and transcribed.
- Task: Design an
agent user experienceto automate ordering coffee/pastries from the Starbucks website,alleviating UX limitationsof a baseline agent. Participants had to design asystem promptfor the agent, define itsactions, and ensure apositive user experience, considering various user inputs (e.g., "Order me a coffee. I'm not sure what I want."). - The
AGENT CAPABILITIES PROMPTandUSER INFORMATION PROMPTwere pre-populated for the exercise. - Participants remotely controlled the researcher's screen to interact with
AgentBuilder.
- Participants completed an
- Analysis: Transcripts and recordings were analyzed using
thematic analysis.Prototyping activitiesfrom the requirements elicitation study were used forpost-coding.Provisional codingandopen codingwere used to identifytooling needs(RQ4), using theDesired Capabilitiesas a framework.
4.2.3.2. Results (In Situ Study)
AgentBuilder enabled all participants, regardless of prior experience, to prototype agents. The study validated the Activities and Capabilities and provided deeper insights.
-
Four-Step Prototyping Process (RQ3):
- Step 1: Designing an initial prototype of the agent: Participants defined a high-level goal, selected
interaction components(from theWorkflowor by writing inprompts), and configuredinformation display. They anticipatedwhat informationto show andhow. - Step 2: Running and monitoring the agent: Participants ran the prototype (
A4) to test specificuser queriesand observe how the agent handled different scenarios (e.g., clarifying ambiguous requests). Theymonitoredagent messages,UI actions, and sometimes usedDEBUG Mode(A6) to seeagent reasoninglive. - Step 3: Debugging the agent: Participants focused on
defectswhen the agent encountered issues (A5). They used theDEBUG Modeto understandtool callsandagent reasoning. Crucially,UI action previewson the webpage helped identify where the agent was clicking or interacting (C6). - Step 4: Iterating on the agent: After identifying issues, participants made changes to the
workfloworpromptsto improveagent behavior(e.g., addingclarification questions, modifyingconstraints). They simplified workflows or updated prompts with more specific instructions.
- Step 1: Designing an initial prototype of the agent: Participants defined a high-level goal, selected
-
Tooling Needs (RQ4) related to
Capabilities:- C1. Use
no-code interfaces: All participants usedAgentBuilder'sno-code interfaces. Some preferred theWorkflowgraph, others relied more on directnatural language promptmodification. Theworkflow notationhelped structure thinking (P14: "I now see the idea of the flow"). However, some felt limited by thefixed options, desiring more understanding of what could be modified (P8: "They both feel dumbed down... I only know these are the only options available."). - C2. Help constrain the agent's task space and knowledge of the user: Even with pre-populated prompts, participants actively thought about the agent's
scopeanduser knowledge. Observing agent behavior helped them realize potential issues (e.g., automatically using a user's password) and reflect onsensitive user information. - C3. Define the
UIof the agent in the chat and the environment: Most participants modified theinformation displayof the agent. This was guided by bothend-user expectationsand their owndeveloper needsforpost-hoc understandingof agent actions. They desired mechanisms to understand the agent'spast, present, and anticipated behaviormore comprehensively. - C4. Provide components that enable the agent to invoke different user interactions: Participants found predefined
interaction componentshelpful for structuring their design ("helped me come up with the phrases... in a much more concise and clear way"). They were able to use thesefinite optionseffectively. However, they desired mechanisms toreliably specify, preview, and testtheseinteraction componentsacross different scenarios. - C5. Provide an environment to run and control the agent: All participants ran their agents. Most used
PauseandPlayfeatures. A common challenge waslosing trackof when the agent was performing actions or awaiting input, especially when the webpage was loading, or the agent was workingautonomously. They desired interfaces that supporttracking detailed runtime informationanderrorsfrom multiple sources in adigestible format. - C6. Help user debug the agent's runtime behavior: A majority used
DEBUG Mode.UI action previewswere very helpful for identifying what the agent was doing visually. However, participants foundagent reasoningoutputs sometimescryptic, needing more support tomake sense ofandadjusthow the agent makes decisions.
- C1. Use
4.2.4. Implementation (Recap)
AgentBuilder is built using TypeScript React applications. The Prototype interface is a web app, and the Execution interface is a modified CowPilot Chrome extension. Key modifications for AgentBuilder included adding natural language descriptions for UI actions, configurable UI action previews, debugging outputs, and tool prompts for interaction components. Workflows are stored as JSON and support bidirectional edits between the graphical workflow and the natural language prompt.
5. Experimental Setup
5.1. Datasets
This paper primarily uses qualitative data derived from user studies rather than traditional machine learning datasets. The "data" consists of participant interviews, observations of their prototyping process, and their think-aloud protocols.
-
Requirements Elicitation Study Participants:
- Source: 12 participants from a large technology company.
- Characteristics: Varied experience with
agents(fromAI engineerstonon-familiar individuals). This diversity was chosen to capture a broad range of perspectives and needs. - Domain: Technology professionals involved in or affected by
AI agentdevelopment and usage.
-
In Situ Agent Prototyping Study Participants:
-
Source: 14 participants (distinct from the first study) from a large technology company.
-
Characteristics: Diverse job roles (Software Engineer, Designer, ML Engineer, Product Manager, etc.), varying
GenAI Usage(weekly to multiple times daily),Prompt Engineeringexperience (no experience to extensive), andAgent Familiarity(never used to seen demos to used an agent). The table below provides details:The following are the results from [Table 1] of the original paper:
PID Job Role GenAI Usage Prompt Engineering Agent Familiarity P1 Software Engineer More than once daily No experience Never used or seen demos of an agent P2 Software Engineer More than once daily Some experience Never used or seen demos of an agent P3 Designer Once daily A little experience Never used or seen demos of an agent P4 Machine Learning Engineer Once daily Extensive experience Used an agent as an end-user P5 GenAI Software Engineer Once daily Extensive experience Seen demos of an agent P6 Product Manager More than once daily Some experience Seen demos of an agent P7 Web Software Engineer More than once daily Extensive experience Seen demos of an agent P8 Human Factors Engineer More than once daily No experience Never used or seen demos of an agent P9 Technical Project Manager Once daily Some experience Seen demos of an agent P10 Engineering Program Manager More than once daily A little experience Seen demos of an agent P11 Modem Systems Test Engineer Weekly A little experience Seen demos of an agent P12 Machine Learning Engineer More than once daily Extensive experience Seen demos of an agent P13 Acoustics Experience Engineer Once daily Some experience Used an agent as an end-user P14 Product Manager More than once daily Substantial experience Seen demos of an agent
-
-
Task: For the in situ study, participants were given a task to design an
agent user experiencefor ordering coffee/pastries from the Starbucks website. Example user inputs included:- "Order me a coffee. I'm not sure what I want."
- "Order me a tall Caffè Misto."
- "Order me a tall iced chai latte and a butter croissant."
- "Get me a croissant." (to order a chocolate croissant)
- "I'm not sure what I want to order. Give me a couple of ideas based on what I've ordered in the past." The agent was assumed to start from the Starbucks homepage and had access to user background information (pickup location, password, previous orders).
-
Why these datasets were chosen: The diverse participant groups were chosen to ensure that the elicited requirements and observed behaviors were representative of the broad set of individuals the paper aims to empower. The Starbucks ordering task was a concrete, real-world scenario that allowed for various levels of complexity and interaction, making it suitable for
agent prototypingandUX exploration.
5.2. Evaluation Metrics
The paper does not use quantitative evaluation metrics in the traditional sense, as it is a qualitative study focused on design requirements, user behavior, and tooling needs. The "evaluation" is primarily conducted through:
- Validation of Design Requirements: The
in situ studywithAgentBuilderserved to validate whether theActivitiesandDesired Capabilitiesidentified in therequirements elicitation studywere indeed relevant and useful in a practicalprototypingscenario. - Elicitation of Insights: The
think-aloudprotocols and observations during thein situ studywere analyzed to uncover deeper insights into theagent prototypingprocess, challenges faced by developers, and specifictooling needs. - Assessment of Affordances: The effectiveness of
AgentBuilderas adesign probewas assessed by its ability to allow diverse participants to successfully createagent prototypesand by the quality of the feedback and insights it generated regarding thedesign requirements.
5.3. Baselines
The paper does not compare AgentBuilder against specific baseline prototyping models or algorithms. Instead, the implicit baseline is the current state of agent development, which typically requires:
-
AI engineering expertise: Deep knowledge ofgenerative AI models,APIs, andprogramming languages. -
Specialized tools: Often code-basedIDEsorprompt engineering platformsthat are not necessarily designed forUX-centric prototypingby non-programmers.The paper aims to show that
AgentBuilderprovides a more accessible and user-centered alternative to these traditional, expert-centric approaches, thereby broadening participation inagent design.
6. Results & Analysis
6.1. Core Results Analysis
The in situ agent prototyping study with AgentBuilder confirmed the validity of the previously identified design requirements (five prototyping activities and six capabilities) and provided rich insights into the agent prototyping process. The most significant finding is that AgentBuilder successfully enabled participants with diverse backgrounds and varying levels of expertise to prototype agent experiences, broadening participation in AI design.
The study validated the four-step prototyping process:
-
Designing an initial prototype: Participants started by defining the agent's goal and selecting
interaction componentsor writingprompts. This highlightedC1 (no-code interfaces)as beneficial for structuring their ideas. -
Running and monitoring the agent: Participants actively ran their prototypes to observe agent behavior and validate the
UX. This reinforced the need forC5 (environment to run and control)but also exposed challenges intracking agent activityacross different UI elements. -
Debugging the agent: When defects occurred, participants utilized
AgentBuilder'sDEBUG Mode, particularlyUI action previews, which directly supportedC6 (debugging runtime behavior). However, thecryptic natureofagent reasoningindicated areas for improvement. -
Iterating on the agent: Based on debugging and observations, participants refined their
workflowsandprompts, demonstrating the iterative nature ofagent design.Key advantages of
AgentBuilderand theno-codeapproach observed:
-
Broadened Participation:
AgentBuilder'sno-code graphical interfaces(C1) allowed individuals withoutAI engineeringorprogramming expertiseto engage inagent prototyping. This was particularly effective forproduct managersanddesigners, enabling them to reason about agentflowsandinteractions. -
Structured Prompting: The
workflow notationinAgentBuilderacted as ascaffoldforprompt engineering. Participants could translate graphicalworkflowcomponents intonatural language instructions, aiding in theiterative and exploratorynature of prompt development. This is a significant innovation compared to unstructuredprompt engineering. -
Understanding Agent Scope and Knowledge (C2): By having to explicitly define
agent capabilitiesanduser informationinprompts, participants gained a better understanding of howLLM agentsperceive and use information. Observing agent behavior in thein situ studyrevealed sensitivities arounduser data(e.g.,passwords), prompting reflection on ethical design. -
Customizable Information Display (C3): Participants actively customized what
informationthe agent displayed (e.g.,UI action previews,agent reasoning). This customization was driven by bothend-user expectationsand the developers' own needs for understanding agent behavior. -
Facilitated Interaction Design (C4): The predefined
interaction components(e.g.,Plan,Interact,Confirm) provided a manageable set of options that helped participants structure agent-user dialogues concisely. -
Interactive Debugging (C6): The
DEBUG ModewithUI action previewswas highly valued for visualizing the agent's actions on the webpage. This made it easier to pinpoint where an agent was failing, which is crucial foragent development.However, the study also revealed several challenges and areas for improvement:
-
Lack of Agency/Clarity in No-Code: While
no-codewas empowering, some participants felt a lack ofagencyor understanding of what else could be modified beyond the providedscaffolds(P8's quote about "dumbed down" interfaces). -
Understanding Agent's Mental Model (C6): While
DEBUG Modehelped, theagent reasoningoutputs were oftencryptic, making it difficult for developers to truly understand why theLLMmade certain decisions and how to effectivelyadjustits behavior. -
Tracking Agent's Runtime State (C5): Participants often
lost trackof the agent's current state, especially duringautonomous actionsorwebpage loading. They desired moredigestibleandintegratedruntime information across the chat and web interface. -
Reliable Testing of Interactions (C4): Participants expressed a need for better ways to
reliably specify, preview, and testhowinteraction componentswould behave in different scenarios.
6.2. Data Presentation (Tables)
The main paper did not contain additional tables beyond the participant table already transcribed above.
6.3. Ablation Studies / Parameter Analysis
The paper, being a qualitative HCI study focused on design requirements and user experience, does not include traditional ablation studies or parameter analysis as would be found in a quantitative machine learning paper. The AgentBuilder itself is a design probe and not a model whose components are being quantitatively evaluated for performance. The validation of its components' effectiveness is implicitly done through the qualitative user feedback and observation during the in situ study.
7. Conclusion & Reflections
7.1. Conclusion Summary
This work thoroughly explored scaffolds for prototyping user experiences of interface agents. It began by identifying key design requirements through a requirements elicitation study with diverse participants, articulating them as five prototyping activities and six desired capabilities for agent prototyping systems. These requirements were then instantiated in AgentBuilder, a no-code design probe that facilitates the design, development, and execution of agent prototypes. An in situ user study using AgentBuilder validated these design requirements and provided crucial insights into how developers prototype agents and their specific needs, particularly regarding debugging, runtime visibility, and interaction component design. The study successfully demonstrated that no-code interfaces can significantly broaden participation in agent design.
7.2. Limitations & Future Work
The authors acknowledge several limitations:
-
Generalizability of Studies: Both studies were conducted with participants from a single, large technology company. This limits the
generalizabilityof the findings to a broader population ofagent developersanddevelopment contexts. Future work could involve more diverse participant pools. -
Sample Size: The sample sizes ( and ) are typical for qualitative
HCIstudies but might not capture the full spectrum of behaviors or needs across a very large population. -
Scope of Agents: The
design probe(AgentBuilder) and the study task (Starbucks web ordering) focused onweb agentsperformingUI actions. The findings might not fully apply toagentsoperating in other environments or performing different types of tasks (e.g., desktop automation, conversational-only agents). -
Fidelity of Design Probe:
AgentBuilderis adesign probe, meaning it's a tool for research and exploration, not a fully-fledged product. While effective for eliciting insights, its capabilities might not be as extensive or robust as a production-readyagent development platform.The authors suggest future work based on their findings, particularly focusing on:
-
Developing
mechanismsthat help developers make sense of theagent's actual implementationrather than just its external behavior. This includes improvingdebugging toolsand makingagent reasoningmore transparent and actionable. -
Exploring ways to make
interaction componentsmorecustomizablewhile retaining the simplicity ofno-code. -
Designing better
runtime monitoring interfacesthatintegrate informationfrom various sources (chat, webpage, internal state) into adigestible formatto help developers track agent activity effectively.
7.3. Personal Insights & Critique
This paper makes a significant contribution by foregrounding the user experience of interface agents and by democratizing their prototyping. The explicit focus on scaffolds for non-AI experts is crucial for fostering human-centered AI design.
Inspirations:
- Democratization of AI: The
no-codeapproach ofAgentBuilderis highly inspiring. It demonstrates a clear path to involveUX designers,product managers, and evenend-usersin theAI development lifecycle, which is vital for creatingAI systemsthat are truly useful, ethical, and aligned with human values. This aligns with the broader movement towardsdemocratizing AI. - Integration of Design and Prompt Engineering: The bidirectional link between the
graphical workflowandnatural language promptsis an elegant solution. It provides structure toprompt engineering, which can often be an unstructured and opaque process. This could be a powerful paradigm for othergenerative AIapplications. - Focus on Debuggability for UX: The emphasis on
runtime monitoringanddebuggingnot just forAI performancebut forUX understandingis a critical insight. Understanding why an agent behaves a certain way (even if it's "correct" from a technical standpoint) is essential forUX designersto refine theagent experience.
Potential Issues/Areas for Improvement:
-
Scalability for Complex Agents: While effective for the Starbucks ordering task, the
no-code workflowmight become unwieldy for extremely complexagentswith many states, conditions, and integrations. Future research could explore hierarchicalworkflowdesign or modularity to manage this complexity. -
Transparency of
Agent Reasoning: The critique thatagent reasoningwascrypticis a common challenge withLLMs. WhileAgentBuilderexposes it, making it interpretable and actionable forUX designers(who may not have deepMLknowledge) remains a significant open problem forXAI (Explainable AI). This paper highlights the need forXAIto be integrated intoprototyping tools. -
Verification and Testing of Interaction Components: Participants desired
reliable ways to specify, preview, and test interaction components. This points to a need for more robusttesting frameworkswithinno-code environmentsthat can simulate varioususer inputsandedge casesto ensure desired agent behaviors. -
Role of
LLMChoice: The paper doesn't deeply delve into how the choice of underlyingLLMmight impact theprototyping processor theagent's behavior. DifferentLLMsmight respond differently toprompts, affecting debuggability and requiringLLM-specific scaffolding.Overall, this paper provides a robust foundation and a compelling
design probefor advancing the field ofhuman-AI interaction, particularly in makinginterface agent designmore accessible and human-centered. Its insights are highly transferable to any domain involvingLLM-powered agentswhereuser experienceis a primary concern.
Similar papers
Recommended via semantic vector search.