Agent-to-Agent Communication

Published: 24 May 2026

When we discussed multi-agent systems, we treated inter-agent communication as a solved problem — agents exchange messages, pass structured handoffs, and coordinate through direct calls or shared pools. That works when all the agents live in the same codebase, share the same framework, and run under one team's control. But the moment you need agents from different organizations to collaborate — a travel-planning agent talking to an airline's booking agent, or an enterprise orchestrator delegating to a vendor's compliance agent — you hit the interoperability wall. Different frameworks, different runtimes, different auth schemes, different message formats. Without a shared protocol, every integration is bespoke.

This is the problem that agent-to-agent (A2A) communication protocols solve. Where MCP standardizes how agents talk to tools, A2A standardizes how agents talk to each other — as peers. The distinction matters: a tool is stateless and performs a predefined function. An agent is stateful, autonomous, and reasons about how to accomplish a goal. Wrapping an agent as a tool strips away its autonomy. A proper agent-to-agent protocol preserves it.

We will walk through why agents are not tools, how agent discovery works, the anatomy of the protocol, the task lifecycle, streaming and async patterns, how A2A and MCP fit together, and the trade-offs you face when adopting an inter-agent protocol. By the end, you should have a clear enough picture to evaluate whether your system needs a formal agent communication layer and how to build one if it does.

Why Agents Are Not Tools #

The temptation is obvious: if you already have MCP for tool communication, why not expose remote agents as MCP tools? Give them a schema, call them like functions, get a result back. It works for simple cases. But it breaks down in three ways.

Agents are stateful. A tool call is fire-and-forget — you send arguments, you get a result. An agent might need multiple turns to complete a task. It might ask clarifying questions. It might report progress incrementally. It might need hours to finish. The tool-call model does not accommodate any of this.

Agents negotiate. When you ask a tool to search a database, it does not push back. When you ask an agent to book a flight, it might respond: "I found three options — which do you prefer?" or "That route is unavailable; should I try alternate dates?" This back-and-forth interaction is fundamental to how agents work, and it does not fit the request/response shape of a tool call.

Agents are opaque. A tool exposes its parameters and return type. You know exactly what it does. An agent's internal reasoning, tools, and memory are its own business. You should not need to know how the booking agent finds flights — only that it can. A proper inter-agent protocol treats each agent as a black box: you send a task, you get results, and the internal implementation is invisible.

Tool Communication (MCP)                   Agent Communication (A2A)
┌──────────┐    call      ┌──────┐   ┌───────────┐   task      ┌──────────┐
│  Agent   │────────────▶ │ Tool │   │  Client   │───────────▶ │  Remote  │
│          │◀──────────── │      │   │  Agent    │◀─────────── │  Agent   │
│          │   result     └──────┘   │           │  status,    │          │
└──────────┘                         │           │  artifacts  │          │
                                     │           │◀─────────── │          │
 Stateless, single turn,             │           │  questions  │          │
 transparent schema.                 └───────────┘             └──────────┘

                                    Stateful, multi-turn, opaque execution.

This does not mean tools and agents are always cleanly separate. Simple agents that always return an immediate answer can be wrapped as tools. But the moment the interaction involves multiple turns, long-running work, or negotiation, you need a protocol designed for agent-to-agent communication.

Agent Discovery #

Before agents can collaborate, they need to find each other and understand what each one can do. This is the discovery problem — analogous to service discovery in microservices, but with richer metadata because agents have skills, not just endpoints.

The Agent Card #

The Agent Card is a JSON document that acts as a digital business card. It describes everything a client agent needs to know to decide whether to interact with a remote agent and how to do so:

Identity — name, description, and provider information.
Endpoint — the URL where the agent accepts requests.
Skills — the specific tasks the agent can perform, with descriptions, input/output types, and examples.
Capabilities — supported protocol features like streaming and push notifications.
Authentication — what credentials the client needs to present.

{
  "name": "flight-booking-agent",
  "description": "Books flights, searches routes, and manages reservations.",
  "url": "https://flights.example.com/a2a",
  "version": "1.0.0",
  "capabilities": {
    "streaming": true,
    "pushNotifications": false
  },
  "authentication": {
    "schemes": ["Bearer"]
  },
  "skills": [
    {
      "id": "search-flights",
      "name": "Search Flights",
      "description": "Search for available flights between two cities on given dates.",
      "inputModes": ["text/plain", "application/json"],
      "outputModes": ["application/json"],
      "examples": [
        "Find flights from New York to London on March 15"
      ]
    },
    {
      "id": "book-flight",
      "name": "Book Flight",
      "description": "Book a specific flight and return a confirmation.",
      "inputModes": ["application/json"],
      "outputModes": ["application/json"]
    }
  ]
}

Skills are the key differentiator from a tool schema. A tool schema describes parameters — the exact shape of input and output. An agent skill describes capabilities — what the agent can do, in what modalities, with examples. The client does not need to know the agent's internal tool set or prompt structure. It just needs to know: can this agent search flights? Does it accept text or structured JSON? What authentication does it require?

Discovery Strategies #

How does a client agent find Agent Cards in the first place? Three strategies cover most scenarios.

Well-known URI. The agent hosts its card at a standardized path: /.well-known/agent-card.json. A client that knows the domain can fetch the card with a single HTTP GET. This follows the same pattern as /.well-known/openid-configuration in OAuth and works well for public agents or agents within a known domain.

import asyncio
import httpx

async def discover_agent(domain: str) -> dict:
    """Fetch an agent's card from the well-known endpoint."""
    url = f"https://{domain}/.well-known/agent-card.json"
    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        response.raise_for_status()
        return response.json()

async def main():
    card = await discover_agent("flights.example.com")
    print(f"Agent: {card['name']}, Skills: {len(card['skills'])}")

asyncio.run(main())

Curated registries. An intermediary service maintains a catalog of Agent Cards. Clients query the registry by skill, domain, or capability — "find me an agent that can book hotels in Europe." This is the enterprise pattern: a central directory that controls which agents are discoverable and by whom. Think of it as a service registry for agents.

Direct configuration. The client knows the agent's endpoint and card at build time — hardcoded in configuration, stored in environment variables, or managed through a deployment manifest. This is the simplest approach and works for tightly coupled systems where agents are under the same team's control.

class AgentDirectory:
    """Resolve agent capabilities from multiple discovery sources."""

    def __init__(self):
        self.cache = {}
        self.registries = []
        self.static_agents = {}

    async def find_agent_for_skill(self, skill_query: str) -> dict | None:
        """Find an agent that matches a skill description."""
        # Check static config first
        for name, card in self.static_agents.items():
            if self._matches_skill(card, skill_query):
                return card

        # Check registries
        for registry in self.registries:
            results = await registry.search(skill=skill_query)
            if results:
                return results[0]

        return None

    def _matches_skill(self, card: dict, query: str) -> bool:
        """Check if any skill in the card matches the query."""
        query_lower = query.lower()
        for skill in card.get("skills", []):
            if query_lower in skill["description"].lower():
                return True
        return False

In practice, most systems start with direct configuration — you know which agents you want to talk to — and add registry-based discovery as the agent ecosystem grows. Well-known URIs work best for cross-organizational discovery where agents are public-facing.

The Protocol Anatomy #

With discovery handled, agents need a way to exchange messages, delegate tasks, and receive results. The protocol defines the wire format, the interaction methods, and the state machine that governs task lifecycles.

Transport and Format #

The protocol uses HTTP as the transport and JSON-RPC 2.0 as the message format. This is a deliberate choice: HTTP is universally supported, firewalls understand it, and every language has mature HTTP libraries. JSON-RPC provides structured request/response semantics with method names, parameters, and error codes — without the overhead of a full REST API.

A message from a client agent to a remote agent looks like this:

{
  "jsonrpc": "2.0",
  "id": "req-001",
  "method": "SendMessage",
  "params": {
    "message": {
      "role": "user",
      "parts": [
        {
          "text": "Find flights from New York to London on March 15"
        }
      ],
      "messageId": "msg-001"
    }
  }
}

The message contains parts — flexible content containers that can hold text, binary data, URLs, or structured JSON. This multimodal design means agents can exchange not just text but images, documents, structured data, and file references in a single message.

Messages, Tasks, and Artifacts #

Three core objects structure every interaction:

Messages are individual turns of communication. Each message has a role (user for the client, agent for the remote agent), a unique ID, and one or more parts. Messages are the conversational layer — they carry instructions, questions, clarifications, and status updates.

Tasks are stateful units of work. When a remote agent receives a message and determines it needs to perform substantial work, it creates a task with a unique ID and a lifecycle (submitted, working, input-required, completed, failed, canceled, rejected). Tasks enable tracking of long-running operations — the client can poll for status, receive streaming updates, or get push notifications when the task completes.

Artifacts are the deliverables. When a task produces a concrete output — a booked reservation, a generated report, a processed image — it wraps the output in an artifact with its own ID, name, and content parts. Artifacts are distinct from messages: a message says "here is what I found," an artifact is what was found.

┌─────────────────────────────────────────────────────────────┐
│                    Interaction Flow                         │
│                                                             │
│  Client                                        Remote Agent │
│    │                                                │       │
│    │──── Message (task request) ───────────────────▶│       │
│    │                                                │       │
│    │◀─── Task (id, status: submitted) ──────────────│       │
│    │                                                │       │
│    │◀─── TaskStatusUpdate (working) ────────────────│       │
│    │                                                │       │
│    │◀─── TaskStatusUpdate (working, progress) ──────│       │
│    │                                                │       │
│    │◀─── TaskArtifactUpdate (partial result) ───────│       │
│    │                                                │       │
│    │◀─── TaskStatusUpdate (completed) ──────────────│       │
│    │                                                │       │
│    │  Final artifacts available on task             │       │
└─────────────────────────────────────────────────────────────┘

Context and Continuity #

A contextId groups related tasks into a logical session. When a client sends a first message, the remote agent returns a contextId along with its response. Subsequent messages that include the same contextId are treated as part of the same conversation — the remote agent can use its internal memory to maintain continuity.

This enables multi-turn workflows:

Client sends: "Find flights from New York to London on March 15."
Agent returns a task with flight options.
Client sends (same contextId): "Book the second option."
Agent creates a new task, referencing the previous one, and completes the booking.

Each follow-up creates a new task within the same context. Completed tasks are immutable — they cannot be restarted or modified. This makes the interaction history a clean, append-only log: each task is a distinct unit of work with clear inputs, outputs, and status.

class A2AClient:
    """Client for interacting with a remote A2A agent."""

    def __init__(self, agent_card: dict, auth_token: str):
        self.endpoint = agent_card["url"]
        self.auth_token = auth_token
        self.context_id = None
        self.request_counter = 0

    async def send_message(
        self,
        text: str,
        reference_task_ids: list[str] = None,
        notification_config: dict | None = None,
    ) -> dict:
        """Send a message to the remote agent."""
        self.request_counter += 1

        message = {
            "role": "user",
            "messageId": f"msg-{self.request_counter}",
            "parts": [{"text": text}],
        }

        if self.context_id:
            message["contextId"] = self.context_id
        if reference_task_ids:
            message["referenceTaskIds"] = reference_task_ids

        params = {"message": message}
        if notification_config:
            params["notificationConfig"] = notification_config

        payload = {
            "jsonrpc": "2.0",
            "id": f"req-{self.request_counter}",
            "method": "SendMessage",
            "params": params,
        }

        async with httpx.AsyncClient() as client:
            response = await client.post(
                self.endpoint,
                json=payload,
                headers={"Authorization": f"Bearer {self.auth_token}"},
            )
            result = response.json()["result"]

        # Capture context for subsequent messages
        if "task" in result and "contextId" in result["task"]:
            self.context_id = result["task"]["contextId"]

        return result

The Task Lifecycle #

Tasks are the heart of A2A. They give both sides — client and remote agent — a shared understanding of where work stands. The lifecycle is a state machine with clear transitions.

                    ┌───────────┐
                    │ Submitted │
                    └─────┬─────┘
                          │
                          ▼
                    ┌───────────┐
            ┌───────│  Working  │─────┐
            │       └─────┬─────┘     │
            │             │           │
            ▼             ▼           ▼
     ┌────────────┐ ┌───────────┐ ┌──────────┐
     │  Input     │ │ Completed │ │  Failed  │
     │  Required  │ └───────────┘ └──────────┘
     └──────┬─────┘
            │ (client responds)
            │
            ▼
    ┌───────────┐
    │  Working  │ (resumes processing)
    └───────────┘

   Terminal states: Completed, Failed, Canceled, Rejected

The key states:

submitted — the task has been received but work has not started.
working — the agent is actively processing the task.
input-required — the agent needs more information from the client before it can continue. This is the "negotiation" state — the agent asks a clarifying question and waits.
completed — the task is done and artifacts are available.
failed — the task could not be completed. The status includes an error description.
canceled — the client requested cancellation.
rejected — the agent refused the task (outside its capabilities, policy violation, etc.).

Once a task reaches a terminal state (completed, failed, canceled, rejected), it is immutable. No more messages can be sent on that task. Any follow-up work starts a new task in the same context. This immutability simplifies both client and server implementations — there is no ambiguity about whether a task can be resumed, and every task maps cleanly to a unit of work with a definitive outcome.

Implementing Task Handling #

On the server side, handling the task lifecycle means mapping incoming messages to internal agent logic and reporting status updates as the work progresses:

class A2ATaskHandler:
    """Handle A2A tasks on the server side."""

    def __init__(self, agent):
        self.agent = agent
        self.tasks = {}

    async def handle_message(self, message: dict) -> dict:
        """Process an incoming message and return a task or message response."""
        context_id = message.get("contextId") or self.generate_context_id()

        # Determine if this needs a task or can be answered immediately
        requires_work = await self.agent.assess_complexity(message)

        if not requires_work:
            # Simple response — return a message, no task needed
            response = await self.agent.generate_response(message, context_id)
            return {"message": response}

        # Create a task for substantial work
        task_id = self.generate_task_id()
        task = {
            "id": task_id,
            "contextId": context_id,
            "status": {"state": "submitted"},
            "artifacts": [],
        }
        self.tasks[task_id] = task

        # Start processing asynchronously
        asyncio.create_task(
            self.process_task(task_id, message, context_id)
        )

        return {"task": task}

    async def process_task(
        self, task_id: str, message: dict, context_id: str
    ):
        """Process a task through the agent's logic."""
        self.update_status(task_id, "working")

        try:
            result = await self.agent.execute(message, context_id)

            # Produce artifacts from the result
            artifact = {
                "artifactId": self.generate_artifact_id(),
                "name": result.get("name", "result"),
                "parts": result["parts"],
            }
            self.tasks[task_id]["artifacts"].append(artifact)
            self.update_status(task_id, "completed")

        except NeedsInputError as e:
            self.update_status(task_id, "input-required", message=e.question)

        except Exception as e:
            self.update_status(task_id, "failed", message=str(e))

    def update_status(self, task_id: str, state: str, message: str = None):
        status = {"state": state}
        if message:
            status["message"] = {"role": "agent", "parts": [{"text": message}]}
        self.tasks[task_id]["status"] = status

Streaming and Async Patterns #

Not every task completes in a single request/response cycle. The protocol supports three interaction patterns for different latency and connectivity requirements.

Request/Response with Polling #

The simplest pattern: the client sends a message, gets back a task in "submitted" or "working" state, and polls periodically for updates.

async def poll_until_complete(
    client: A2AClient, task_id: str, interval: float = 2.0, timeout: float = 300
) -> dict:
    """Poll a task until it reaches a terminal state."""
    deadline = time.time() + timeout

    while time.time() < deadline:
        task = await client.get_task(task_id)
        state = task["status"]["state"]

        if state in ("completed", "failed", "canceled", "rejected"):
            return task

        if state == "input-required":
            return task  # Client needs to respond

        await asyncio.sleep(interval)

    raise TimeoutError(f"Task {task_id} did not complete within {timeout}s")

Polling is simple but wasteful — you burn network calls checking for updates that have not happened. It works for prototyping and for tasks that complete quickly (under a minute).

Server-Sent Events (Streaming) #

For real-time updates, the client opens a streaming connection and receives events as the task progresses. This uses Server-Sent Events (SSE) — a standard HTTP mechanism where the server pushes events over a long-lived connection.

async def stream_task(client: A2AClient, message: str):
    """Send a message and stream task updates in real time."""
    async with httpx.AsyncClient() as http:
        async with http.stream(
            "POST",
            client.endpoint,
            json={
                "jsonrpc": "2.0",
                "id": "req-stream-1",
                "method": "SendMessageStream",
                "params": {
                    "message": {
                        "role": "user",
                        "parts": [{"text": message}],
                        "messageId": "msg-stream-1",
                    }
                },
            },
            headers={"Authorization": f"Bearer {client.auth_token}"},
        ) as response:
            async for line in response.aiter_lines():
                if line.startswith("data: "):
                    event = json.loads(line[6:])
                    event_type = event.get("type")

                    if event_type == "TaskStatusUpdateEvent":
                        print(f"Status: {event['status']['state']}")

                    elif event_type == "TaskArtifactUpdateEvent":
                        print(f"Artifact: {event['artifact']['name']}")

                    if event.get("final", False):
                        return event

Streaming is the right pattern for interactive use cases — a user watching progress, a coordinator agent monitoring sub-tasks in real time. The connection stays open as long as the task runs, and the client receives incremental updates as they happen.

Push Notifications #

For very long-running task or disconnected client, the server can send updates to a webhook provided by the client. The client does not need to maintain a connection — it registers a callback URL and receives HTTP POST requests when the task status changes.

async def submit_with_webhook(
    client: A2AClient, message: str, webhook_url: str
) -> str:
    """Submit a task and receive updates via webhook."""
    result = await client.send_message(
        text=message,
        notification_config={
            "url": webhook_url,
            "events": ["status_changed", "artifact_ready"],
        },
    )
    return result["task"]["id"]


# On the webhook receiver side:
async def handle_webhook(request):
    """Process push notification from the remote agent."""
    event = await request.json()

    if event["status"]["state"] == "completed":
        artifacts = event.get("artifacts", [])
        await process_completed_task(event["taskId"], artifacts)

    elif event["status"]["state"] == "input-required":
        question = event["status"]["message"]["parts"][0]["text"]
        await route_to_human(event["taskId"], question)

Push notifications work for batch processing, background workflows, and scenarios where the client agent is itself a service that processes tasks asynchronously. The trade-off: you need a publicly reachable endpoint to receive the callbacks, plus the security implications of accepting inbound webhooks.

A2A and MCP Together #

A2A and MCP occupy different layers of the agent stack. MCP connects an agent to its tools — databases, APIs, file systems, code executors. A2A connects agents to each other. They complement rather than compete, and a well-designed system uses both.

┌─────────────────────────────────────────────────────┐
│                     Agent Stack                     │
│                                                     │
│  ┌────────────────┐    A2A     ┌────────────────┐   │
│  │  Client Agent  │◀─────────▶│  Remote Agent   │   │
│  │                │            │                │   │
│  │  ┌──────────┐  │            │  ┌──────────┐  │   │
│  │  │  Model   │  │            │  │  Model   │  │   │
│  │  └──────────┘  │            │  └──────────┘  │   │
│  │       │        │            │       │        │   │
│  │  ┌────┴─────┐  │            │  ┌────┴─────┐  │   │
│  │  │MCP Client│  │            │  │MCP Client│  │   │
│  │  └────┬─────┘  │            │  └────┬─────┘  │   │
│  └───────┼────────┘            └───────┼────────┘   │
│          │  MCP                        │  MCP       │
│          ▼                             ▼            │
│  ┌──────────────┐              ┌──────────────┐     │
│  │  Tool Server │              │  Tool Server │     │
│  │  (DB, APIs)  │              │  (Search,    │     │
│  │              │              │   docs)      │     │
│  └──────────────┘              └──────────────┘     │
└─────────────────────────────────────────────────────┘

A concrete example: a customer support orchestrator uses MCP to connect to its own tools (customer database, order system, knowledge base). When it encounters a billing dispute that requires a specialized analysis, it delegates to a remote billing-analysis agent via A2A. That billing agent uses its own MCP-connected tools (financial records, fraud detection APIs). The orchestrator does not see or care about the billing agent's tools — it just sends a task and gets results back.

This separation has a security benefit: the billing agent's tools and data never cross the protocol boundary. The orchestrator cannot access the fraud detection API directly. Each agent's tool set is an internal implementation detail, invisible to its A2A peers. This opaque execution property means agents can collaborate without sharing proprietary logic, sensitive data sources, or internal tool configurations.

When A2A Wrapping Makes Sense #

Not every inter-agent interaction needs A2A. If all your agents are in the same process, share the same framework, and are deployed together, direct function calls are simpler and faster. A2A adds value when:

Agents are developed by different teams or organizations.
Agents run on different infrastructure or in different security domains.
Agents need to be discovered dynamically rather than hardcoded.
Agents have long-running tasks that require status tracking.
You want to swap implementations without changing the client (e.g., replacing a vendor's agent with your own).

For internal agents within a single application, the multi-agent patterns we covered — direct messaging, shared pools, hierarchical routing — are sufficient. A2A is for the boundary where "inside the system" ends and "external agent" begins.

Building an A2A Server #

Exposing your agent as an A2A-compliant server requires three pieces: serving the Agent Card for discovery, handling the SendMessage method, and managing the task lifecycle.

from starlette.applications import Starlette
from starlette.responses import JSONResponse
from starlette.routing import Route

AGENT_CARD = {
    "name": "research-assistant",
    "description": "Searches academic papers and summarizes findings.",
    "url": "https://research.example.com/a2a",
    "version": "1.0.0",
    "capabilities": {"streaming": True, "pushNotifications": False},
    "authentication": {"schemes": ["Bearer"]},
    "skills": [
        {
            "id": "search-papers",
            "name": "Search Papers",
            "description": "Search for academic papers on a topic and return summaries.",
            "inputModes": ["text/plain"],
            "outputModes": ["application/json", "text/plain"],
        }
    ],
}

async def agent_card_endpoint(request):
    """Serve the Agent Card for discovery."""
    return JSONResponse(AGENT_CARD)

task_handler = A2ATaskHandler(agent=research_agent)

async def send_message_endpoint(request):
    """Handle incoming A2A messages."""
    body = await request.json()

    if body.get("method") != "SendMessage":
        return JSONResponse(
            {"jsonrpc": "2.0", "id": body["id"],
             "error": {"code": -32601, "message": "Method not found"}},
            status_code=400,
        )

    message = body["params"]["message"]
    result = await task_handler.handle_message(message)

    return JSONResponse({"jsonrpc": "2.0", "id": body["id"], "result": result})

app = Starlette(routes=[
    Route("/.well-known/agent-card.json", agent_card_endpoint, methods=["GET"]),
    Route("/a2a", send_message_endpoint, methods=["POST"]),
])

The server is a standard HTTP application. The Agent Card is served at the well-known path. Messages arrive as JSON-RPC POST requests. The task handler processes them through the agent's logic and returns task objects with status and artifacts. Authentication — verifying the Bearer token — happens in middleware, the same way you would protect any HTTP endpoint.

Security Considerations #

Agent-to-agent communication crosses trust boundaries. The remote agent is not your code — it is someone else's system, running their logic, with their access to data. Security requires attention at multiple layers.

Authentication. The Agent Card declares what authentication scheme the remote agent requires. The client must present valid credentials — typically an OAuth token or API key — with every request. The authentication mechanism uses standard HTTP headers, separate from the A2A protocol messages. This means existing identity infrastructure (OAuth providers, API gateways, token services) works without modification.

Authorization. Authentication tells you who the client is. Authorization tells you what they can do. A remote agent should scope access by client identity: some clients can search but not book, some can access production data but not financial records. This is standard RBAC applied at the agent level.

Input validation. Messages from external agents are untrusted input. Every field should be validated — message format, part types, content length. The same guardrails that protect against user prompt injection apply to messages from other agents: content filtering, injection detection, and PII handling.

Data minimization. An agent should return only the data the client needs. If the client asks for a flight search, the response should contain flight options — not the internal pricing model, negotiation logic, or user database queries that produced them. The opaque execution principle is also a data protection principle.

class A2ASecurityMiddleware:
    """Validate authentication and apply security controls."""

    def __init__(self, app, auth_verifier, rate_limiter):
        self.app = app
        self.auth_verifier = auth_verifier
        self.rate_limiter = rate_limiter

    async def __call__(self, scope, receive, send):
        if scope["type"] != "http":
            return await self.app(scope, receive, send)

        path = scope.get("path", "")

        # Agent card is public (or protected separately)
        if path == "/.well-known/agent-card.json":
            return await self.app(scope, receive, send)

        # Verify authentication for all other endpoints
        headers = dict(scope.get("headers", []))
        auth_header = headers.get(b"authorization", b"").decode()

        if not await self.auth_verifier.verify(auth_header):
            response = JSONResponse(
                {"error": "Unauthorized"}, status_code=401
            )
            return await response(scope, receive, send)

        # Rate limiting per client
        client_id = await self.auth_verifier.extract_client_id(auth_header)
        if not self.rate_limiter.allow(client_id):
            response = JSONResponse(
                {"error": "Rate limit exceeded"}, status_code=429
            )
            return await response(scope, receive, send)

        return await self.app(scope, receive, send)

Trade-offs #

Latency. Every inter-agent call is a network round trip plus the remote agent's processing time. In a chain of three agents, the total latency is at least three round trips plus three model calls. For time-sensitive applications, this overhead can be prohibitive. Mitigate with streaming (start processing partial results immediately) and parallelization (call independent agents concurrently).

Complexity. A2A adds protocol handling, task state management, authentication, discovery, and error handling on top of your agent logic. For agents that only need to talk to other agents within the same codebase, this overhead is not justified. Use A2A at organizational boundaries; use direct calls within them.

Debugging. When a task fails in a chain of A2A calls, tracing the failure requires correlating logs across multiple independent services. Each agent is a black box — you can see what went in and what came out, but not what happened inside. This is the trade-off of opaque execution: you get security and encapsulation at the cost of observability. Structured logging with correlation IDs (the contextId and taskId) is essential.

Versioning. Agent capabilities evolve. A skill that accepted text input might later require structured JSON. The Agent Card's version field helps clients detect changes, but handling version mismatches gracefully — falling back to a compatible format, or failing clearly when a skill is no longer available — requires careful engineering. Treat the Agent Card like an API contract: changes should be backward-compatible, and breaking changes should increment the version.

Trust. Delegating a task to an external agent means trusting that agent to produce correct, safe results. A compromised or poorly built remote agent can return hallucinated data, execute harmful actions, or exfiltrate information from the messages you send. Apply the same guardrails to agent responses that you apply to model outputs: validate, verify, and do not blindly trust.

Conclusion #

Agent-to-agent communication moves multi-agent systems from single-codebase architectures to distributed, cross-organizational collaboration. The core ideas are protocol-level, not vendor-specific.

Agents are not tools. Tools are stateless, transparent, and fire-and-forget. Agents are stateful, opaque, and conversational. Wrapping an agent as a tool strips away its ability to negotiate, ask questions, and manage long-running work.
Agent Cards enable discovery. A JSON metadata document describes what an agent can do, where to reach it, and how to authenticate. Clients find cards through well-known URIs, curated registries, or direct configuration.
Tasks are the unit of work. A task has an ID, a lifecycle (submitted → working → completed/failed), and produces artifacts. Completed tasks are immutable — follow-up work starts new tasks in the same context.
Three interaction patterns cover different latency needs: request/response with polling, streaming via Server-Sent Events, and push notifications via webhooks.
A2A and MCP are complementary. MCP connects agents to tools. A2A connects agents to each other. Each agent's tools are invisible to its A2A peers — opaque execution preserves security and encapsulation.
Use A2A at trust boundaries. For agents within the same codebase, direct calls are simpler. A2A earns its overhead when agents cross organizational, security, or infrastructure boundaries.
Treat inter-agent messages as untrusted input. Authentication, authorization, input validation, and output verification apply to agent-to-agent communication just as they do to user-facing interfaces.