Agent UX & Transparency

Publish at:

An agent that does brilliant work but leaves the user staring at a spinner for thirty seconds is an agent that gets turned off. Users do not trust black boxes. They trust systems that show their work, communicate progress, and let them intervene when things go sideways. The user experience of an AI agent is not a cosmetic layer — it is a core design constraint that affects architecture, latency budgets, and even which orchestration patterns you can use.

We cover the engineering behind agent UX: how to stream partial results so the interface feels alive, how to progressively disclose detail without overwhelming the user, how to surface the agent's reasoning in a way that builds confidence, and how to design the feedback loops that turn a powerful-but-opaque system into one people actually trust.

Streaming Partial Results #

The single biggest UX improvement for any agent is streaming. A user who sees tokens appearing word-by-word perceives a two-second response as fast. The same user staring at a blank screen for two seconds perceives it as broken. Streaming gives the user early signal about whether the agent understood their intent, letting them interrupt and redirect before the full response completes.

Token-Level Streaming #

Most model APIs support server-sent events (SSE) that deliver tokens as they are generated. Your orchestrator should forward these immediately rather than buffering the full response:

from collections.abc import Awaitable, Callable
from dataclasses import dataclass


@dataclass
class StreamChunk:
    """A single piece of streamed agent output."""
    content: str | None = None
    tool_call_start: str | None = None  # Tool name if a call begins
    tool_call_result: str | None = None
    reasoning: str | None = None  # Chain-of-thought fragment
    status: str | None = None  # "thinking", "executing", "responding"


async def stream_agent_response(
    model_client,
    messages: list[dict],
    on_chunk: Callable[[StreamChunk], Awaitable[None]],
) -> str:
    """Stream model output token-by-token, forwarding each chunk to the UI."""
    full_response = []

    async for token in model_client.stream_completion(messages):
        full_response.append(token)
        await on_chunk(StreamChunk(content=token, status="responding"))

    return "".join(full_response)

The key architectural decision is where streaming happens in your stack. In a typical agent loop, the model generates text, then the orchestrator detects a tool call, executes it, and feeds the result back. Each of these phases should emit status updates before final text generation.

Tool Execution Status #

When the agent calls a tool, the user sees... nothing. Unless you tell them. Emitting status events during tool execution eliminates the "dead air" that erodes trust:

async def execute_tool_with_status(
    tool_name: str,
    args: dict,
    on_chunk: Callable[[StreamChunk], Awaitable[None]],
    tool_registry: dict,
) -> dict:
    """Execute a tool while streaming status updates to the UI."""
    await on_chunk(StreamChunk(
        status="executing",
        tool_call_start=tool_name,
    ))

    # Execute the actual tool
    tool_fn = tool_registry[tool_name]
    result = await tool_fn(**args)

    await on_chunk(StreamChunk(
        status="responding",
        tool_call_result=f"{tool_name} completed",
    ))

    return result

In practice, this means your UI can show messages like "Searching documents...", "Running code...", or "Querying database..." — each one a signal that the agent is working, not stuck.

Partial Result Assembly #

Some tasks produce intermediate artifacts the user cares about. A research agent that finds ten sources before synthesizing should show those sources as they arrive, not after the final summary. This requires your orchestrator to distinguish between intermediate results (worth showing) and internal state (not worth showing):

┌─────────────────────────────────────────────────────┐
│                  User Interface                     │
│                                                     │
│  ┌───────────────────────────────────────────────┐  │
│  │  Status: "Searching for relevant papers..."   │  │
│  │                                               │  │
│  │  Found so far:                                │  │
│  │  • [Source 1] "Attention Is All You Need"     │  │
│  │  • [Source 2] "ReAct: Synergizing..."         │  │
│  │  • [Source 3] loading...                      │  │
│  │                                               │  │
│  │  ░░░░░░░░░░░░ Synthesizing...                 │  │
│  └───────────────────────────────────────────────┘  │
│                                                     │
└─────────────────────────────────────────────────────┘
         │
         │ SSE stream
         ▼
┌─────────────────────────────────────────────────────┐
│               Agent Orchestrator                    │
│                                                     │
│  Phase 1: Search (emit each result as found)        │
│  Phase 2: Filter (internal — no UI update)          │
│  Phase 3: Synthesize (stream tokens)                │
│                                                     │
└─────────────────────────────────────────────────────┘

The design decision is: which intermediate states are meaningful to the user? A good heuristic is to surface anything the user might want to act on — click a link, verify a source, redirect the search — and hide pure bookkeeping.

Progressive Disclosure #

Agents generate a lot of information: reasoning traces, tool calls, intermediate results, confidence signals, citations. Dumping all of this into the UI produces cognitive overload. Progressive disclosure means showing a clean, actionable surface by default and letting the user drill into detail on demand.

Layered Information Architecture #

Design your agent output in three layers:

Layer 1 — The Answer. The final response: a summary, a code block, an action confirmation. This is what 80% of users need 80% of the time.

Layer 2 — The Evidence. Citations, tool results, key data points that support the answer. Shown inline or in a collapsible section. Useful for verification without requiring the user to understand the full reasoning chain.

Layer 3 — The Trace. The full chain-of-thought, every tool call with arguments and responses, timing data, model selection decisions. Available for debugging and audit, but never forced on the user.

from dataclasses import dataclass


@dataclass
class AgentResponse:
    """Structured agent output supporting progressive disclosure."""
    # Layer 1: The answer
    answer: str
    confidence: float  # 0.0 to 1.0

    # Layer 2: The evidence
    sources: list[dict]  # {"title": ..., "url": ..., "snippet": ...}
    tool_results_summary: list[str]  # Human-readable summaries

    # Layer 3: The trace
    reasoning_trace: list[str]  # Chain-of-thought steps
    tool_calls: list[dict]  # Full tool call records with args/results
    timing: dict  # {"total_ms": ..., "model_ms": ..., "tool_ms": ...}
    model_used: str  # Which model handled this request


def render_for_ui(response: AgentResponse, detail_level: str = "answer") -> dict:
    """Render the response at the requested detail level."""
    output = {"answer": response.answer}

    if detail_level in ("evidence", "trace"):
        output["sources"] = response.sources
        output["tool_summaries"] = response.tool_results_summary

    if detail_level == "trace":
        output["reasoning"] = response.reasoning_trace
        output["raw_tool_calls"] = response.tool_calls
        output["timing"] = response.timing

    return output

Adaptive Detail Based on Context #

Not every interaction warrants the same disclosure level. A quick factual answer needs less explanation than a decision that affects production data. Your agent can signal how much explanation to show by default:

  • Low-stakes, high-confidence: show the answer only. Example: "The function is defined on line 42."
  • Medium-stakes or medium-confidence: show the answer plus evidence. Example: "Based on three matching documents, the policy allows..."
  • High-stakes or low-confidence: show the answer, evidence, and proactively surface uncertainty. Example: "I found conflicting information — here are both interpretations."

Respect the user's attention budget by putting the most important signal first.

Explaining Reasoning #

Users trust agents more when they can see why the agent did what it did. But "show your work" is harder than it sounds. Raw chain-of-thought is verbose, repetitive, and often confusing to non-experts. The engineering challenge is to present reasoning in a way that is honest, concise, and actionable.

Structured Reasoning Summaries #

Rather than dumping the raw internal monologue, extract a structured summary of the key decision points:

from dataclasses import dataclass


@dataclass
class ReasoningStep:
    """A single decision point in the agent's reasoning."""
    action: str  # What the agent did
    rationale: str  # Why it did it (1-2 sentences)
    alternatives_considered: list[str]  # What it chose not to do
    confidence: float


def summarize_reasoning(
    raw_trace: list[str],
    model_client,
) -> list[ReasoningStep]:
    """Distill a verbose reasoning trace into key decision points.

    Uses a lightweight model call to extract structure from
    the raw chain-of-thought.
    """
    prompt = f"""Extract the key decision points from this reasoning trace.
For each decision, identify:
- What action was taken
- Why (1-2 sentences)
- What alternatives were considered
- Confidence level (0.0-1.0)

Trace:
{chr(10).join(raw_trace)}

Return as JSON array."""

    result = model_client.complete(prompt)
    return parse_reasoning_steps(result)

This gives users a clear "audit trail" without requiring them to parse hundreds of tokens of internal deliberation.

Citations and Attribution #

When an agent's answer depends on retrieved information, always link back to the source. This serves two purposes: it lets the user verify claims, and it limits hallucination damage by making unsupported claims visually obvious (they have no citation).

from dataclasses import dataclass


@dataclass
class CitedClaim:
    """A claim in the agent's response linked to its source."""
    text: str
    source_id: str
    source_title: str
    source_url: str | None
    relevance_score: float


def build_cited_response(
    answer: str,
    retrieved_docs: list[dict],
    model_client,
) -> tuple[str, list[CitedClaim]]:
    """Generate a response where each major claim links to a source.

    The model is instructed to use inline citation markers [1], [2], etc.
    Post-processing maps these to actual source metadata.
    """
    sources_text = "\n".join(
        f"[{i+1}] {doc['title']}: {doc['snippet']}"
        for i, doc in enumerate(retrieved_docs)
    )

    prompt = f"""Answer the user's question using the sources below.
Cite each claim with [N] markers. If a claim cannot be supported
by any source, prefix it with [unsupported].

Sources:
{sources_text}

Question: {answer}"""

    response = model_client.complete(prompt)
    claims = extract_citations(response, retrieved_docs)
    return response, claims

The [unsupported] prefix is a powerful transparency mechanism. It makes the agent's confidence boundaries visible in the output itself, rather than hiding uncertainty behind fluent prose.

Confidence Communication #

Agents should not present every output with equal conviction. When the agent is uncertain, the UI should reflect that — through language, visual cues, or explicit confidence scores:

def format_with_confidence(answer: str, confidence: float) -> dict:
    """Attach appropriate hedging and visual signals to an answer."""
    if confidence >= 0.9:
        return {
            "text": answer,
            "indicator": "high-confidence",
            "hedge": None,
        }
    elif confidence >= 0.7:
        return {
            "text": answer,
            "indicator": "medium-confidence",
            "hedge": "Based on available information:",
        }
    else:
        return {
            "text": answer,
            "indicator": "low-confidence",
            "hedge": "I'm not fully certain, but here's my best interpretation:",
        }

The threshold values here are illustrative. Calibrate them to your domain — a medical agent needs different thresholds than a code-search agent.

Designing Trust #

Trust is an emergent property of consistent, predictable behavior over time. But there are concrete architectural decisions that accelerate or undermine it.

Predictable Behavior Boundaries #

Users trust agents that do what they say and say what they do. This means:

  • Before taking an action with side effects, describe what you are about to do and ask for confirmation (this intersects with human-in-the-loop patterns we discussed earlier).
  • After taking an action, report exactly what happened — including partial failures.
  • Never silently retry. If something failed and you are trying again, say so.
from collections.abc import Awaitable, Callable
from dataclasses import dataclass


@dataclass
class ActionProposal:
    """A proposed action the agent wants to take, pending user approval."""
    description: str  # Human-readable explanation
    impact: str  # "read-only", "write", "destructive"
    reversible: bool
    details: dict  # Full parameters for inspection


@dataclass
class ActionReport:
    """Post-execution report of what actually happened."""
    action: str
    success: bool
    changes_made: list[str]  # Specific changes for audit
    errors: list[str] | None = None
    rollback_available: bool = False


async def propose_and_execute(
    proposal: ActionProposal,
    executor: Callable[[dict], Awaitable[object]],
    ui_callback: Callable[..., Awaitable[object]],
) -> ActionReport:
    """Propose an action, wait for approval, execute, and report."""
    # Show the user what we want to do
    approved = await ui_callback(
        "confirm_action",
        proposal,
    )

    if not approved:
        return ActionReport(
            action=proposal.description,
            success=False,
            changes_made=[],
            errors=["User declined"],
        )

    # Execute and report
    result = await executor(proposal.details)

    report = ActionReport(
        action=proposal.description,
        success=result.success,
        changes_made=result.changes,
        errors=result.errors,
        rollback_available=result.can_rollback,
    )

    await ui_callback("action_complete", report)
    return report

Error Transparency #

When an agent fails, the worst thing it can do is pretend it succeeded. The second worst thing is to show a generic "Something went wrong" message. Good error UX means:

  • Distinguishing between what failed (the tool timed out) and what this means for the user (your file was not saved).
  • Offering concrete next steps: retry, use a different approach, or escalate to a human.
  • Preserving partial progress visibly — if the agent completed 7 of 10 steps before failing, show those 7 results.
from dataclasses import dataclass


@dataclass
class TransparentError:
    """An error message designed for user trust, not just debugging."""
    what_happened: str  # Technical: "Database query timed out after 30s"
    what_it_means: str  # User impact: "Your report could not be generated"
    partial_results: list | None  # What we got before the failure
    suggested_actions: list[str]  # What the user can do next
    retry_possible: bool
    details_for_debugging: str | None  # Collapsible technical detail


def handle_tool_failure(
    tool_name: str,
    error: Exception,
    partial: list | None = None,
) -> TransparentError:
    """Convert a raw tool failure into a user-facing transparency object."""
    return TransparentError(
        what_happened=f"{tool_name} failed: {type(error).__name__}",
        what_it_means=explain_user_impact(tool_name, error),
        partial_results=partial,
        suggested_actions=suggest_recovery(tool_name, error),
        retry_possible=is_retryable(error),
        details_for_debugging=str(error),
    )

The Undo Contract #

Trust increases dramatically when actions are reversible. If your agent can undo what it just did, say so. If it cannot, say that too — before acting, not after. This pairs with the action proposal pattern above:

  • Reversible actions (editing a draft, creating a branch): execute with low friction, offer undo afterward.
  • Irreversible actions (sending an email, deploying to production): require explicit confirmation, show a preview first.

The UX cost of confirmation dialogs is real. You will be tempted to skip them for "obvious" actions. Resist that temptation for any action with external side effects. Users build trust fastest when they feel in control, even if it costs them an extra click.

Latency Budgets and UX Trade-offs #

Every transparency feature has a latency cost. Summarizing reasoning takes an extra model call. Generating citations requires cross-referencing sources. Streaming requires maintaining open connections. The architecture must make these trade-offs explicit.

┌──────────────────────────────────────────────────────────┐
│                   Latency Budget                         │
│                                                          │
│  Total acceptable latency: 3000ms (first meaningful      │
│  content must appear within this window)                 │
│                                                          │
│  ┌────────────┬────────────┬────────────┬─────────────┐  │
│  │  Model     │  Tool      │  Reasoning │  Citation   │  │
│  │  TTFT      │  Execution │  Summary   │  Generation │  │
│  │  ~200ms    │  ~500ms    │  ~800ms    │  ~400ms     │  │
│  │            │            │  (async)   │  (async)    │  │
│  └────────────┴────────────┴────────────┴─────────────┘  │
│                                                          │
│  Strategy: Stream tokens immediately. Generate           │
│  reasoning summary and citations asynchronously.         │
│  Append them to the response after the main answer.      │
│                                                          │
└──────────────────────────────────────────────────────────┘

The key insight is that transparency features do not need to block the primary response. Stream the answer first, then append the evidence and reasoning as they become available. This gives users the responsiveness they expect while still delivering the transparency they need — just slightly delayed.

import asyncio
from collections.abc import Awaitable, Callable


async def stream_with_deferred_transparency(
    model_client,
    messages: list[dict],
    retrieved_docs: list[dict],
    on_chunk: Callable[[StreamChunk], Awaitable[None]],
):
    """Stream the answer immediately, then append transparency data."""
    # Phase 1: Stream the answer (low latency)
    answer_tokens = []
    async for token in model_client.stream_completion(messages):
        answer_tokens.append(token)
        await on_chunk(StreamChunk(content=token))

    full_answer = "".join(answer_tokens)

    # Phase 2: Generate transparency data (async, non-blocking to user)
    await on_chunk(StreamChunk(status="generating_citations"))

    citations_task = asyncio.create_task(
        generate_citations(full_answer, retrieved_docs, model_client)
    )
    reasoning_task = asyncio.create_task(
        summarize_reasoning_trace(messages, model_client)
    )

    citations = await citations_task
    reasoning = await reasoning_task

    # Phase 3: Append transparency data
    await on_chunk(StreamChunk(
        content=format_citations_block(citations),
        reasoning=format_reasoning_summary(reasoning),
    ))

Conclusion #

Agent UX is an architectural concern. Streaming partial results eliminates dead air and lets users course-correct early. Progressive disclosure respects attention budgets by layering information from answer to evidence to full trace. Reasoning explanations and citations make the agent's logic auditable without overwhelming the user. Trust emerges from predictable behavior: proposing actions before taking them, reporting honestly on outcomes, and making failure as transparent as success.

The trade-off at the center of all these patterns is latency versus transparency. Every explanation costs time. The engineering solution is to decouple them: stream the answer first, compute the transparency artifacts asynchronously, and append them. Users get responsiveness and auditability — they just arrive in that order.