Sequential & Parallel Patterns

Publish at:

We already made the case for splitting work across multiple agents — and warned about the cost. Now it is time to build something.

The sequential and parallel patterns are the two simplest multi-agent architectures, and they share a defining property: the developer controls the topology. No AI model decides which agent runs next or how many agents to spin up. The control flow is hardcoded, just like a function pipeline or a map-reduce job. The model does the reasoning inside each agent; the code does the routing between them.

This matters because deterministic coordination is the easiest kind to debug, test, and reason about. When something breaks, you know exactly which agent ran, what it received, and what it produced — because you wrote the wiring. These patterns are the multi-agent equivalent of the prompt chaining and parallelization we covered for single-agent workflows, but scaled up: each node is far more than a single model call — it is a full agent with its own system prompt, tool set, and ReAct loop.

The Sequential Pipeline #

A sequential pipeline passes a task through a chain of agents, one after another. Agent A runs to completion, its output becomes Agent B's input, Agent B runs to completion, and so on. Each agent has a focused responsibility — a narrow system prompt, a small tool set, and a clear definition of what "done" looks like.

  Task
   │
   ▼
┌──────────────────┐      ┌──────┐     ┌──────────────────┐
│  Agent A         │────▶ │ Gate │────▶│  Agent B         │
│  (research)      │      └──────┘     │  (analysis)      │
└──────────────────┘                   └────────┬─────────┘
                                                │
                            ┌───────────────────┘
                            ▼
                        ┌──────┐      ┌──────────────────┐
                        │ Gate │────▶ │  Agent C         │
                        └──────┘      │  (writing)       │
                                      └──────────────────┘
                                          │
                                          ▼
                                        Output

The gates between agents are programmatic validation checks — not model calls. They verify that the previous agent's output meets the next agent's expectations before the pipeline continues.

How It Works #

Each agent in the pipeline is an independent unit. It receives a message, runs its internal ReAct loop (reasoning, tool use, observation), and produces a final response. The pipeline code creates the agents, calls them in order, and threads the output through.

class Agent:
    def __init__(self, name, system_prompt, tools, model):
        self.name = name
        self.system_prompt = system_prompt
        self.tools = tools
        self.model = model

    def run(self, message):
        """Execute the agent's ReAct loop and return the final response."""
        messages = [{"role": "user", "content": message}]
        while True:
            response = call_model(
                model=self.model,
                system=self.system_prompt,
                messages=messages,
                tools=self.tools,
            )
            if response.has_tool_calls:
                results = execute_tools(response.tool_calls, self.tools)
                messages.append(response.to_message())
                messages.append(tool_results_message(results))
            else:
                return response.text


def sequential_pipeline(task, agents, gates=None):
    """Run agents in sequence, passing each output to the next."""
    current_input = task
    gates = gates or {}

    for i, agent in enumerate(agents):
        output = agent.run(current_input)

        # Run the gate for this agent if one exists
        gate_fn = gates.get(agent.name)
        if gate_fn:
            passed, error = gate_fn(output)
            if not passed:
                raise PipelineError(
                    f"Gate failed after {agent.name}: {error}",
                    agent=agent.name,
                    output=output,
                )

        current_input = output

    return current_input

The pipeline itself is trivial — a for loop. That is the point. The complexity lives inside each agent (reasoning, tool use, multi-step problem solving) while the wiring between agents is plain, readable code.

A Concrete Example #

Consider a due-diligence pipeline for evaluating a company. The task requires research, financial analysis, and a final report — three different skill sets, three different tool sets, three different system prompts.

researcher = Agent(
    name="researcher",
    system_prompt="""You are a business research analyst. Given a company name,
    gather key facts: founding date, leadership, market position, recent news,
    and competitive landscape. Use your search tools to find current information.
    Return a structured research brief.""",
    tools=[web_search, news_search, company_database],
    model="large-model",
)

financial_analyst = Agent(
    name="financial_analyst",
    system_prompt="""You are a financial analyst. Given a research brief about
    a company, analyze the financial data: revenue trends, margins, debt levels,
    and key ratios. Use your tools to pull financial statements and calculate
    metrics. Return a financial assessment with specific numbers.""",
    tools=[financial_api, calculator, sec_filings],
    model="large-model",
)

report_writer = Agent(
    name="report_writer",
    system_prompt="""You are a report writer specializing in due-diligence
    reports. Given a research brief and financial assessment, produce a concise
    executive summary (max 500 words) with key findings, risks, and a
    recommendation. Use formal tone. Cite specific figures from the input.""",
    tools=[],  # No tools needed — pure synthesis
    model="medium-model",
)

# Validation gates
def validate_research(output):
    required = ["leadership", "market", "competitive"]
    missing = [r for r in required if r.lower() not in output.lower()]
    if missing:
        return False, f"Research brief missing sections: {missing}"
    if len(output.split()) < 100:
        return False, "Research brief too short"
    return True, None

def validate_financials(output):
    # Check that the analysis contains actual numbers
    import re
    numbers = re.findall(r'\$[\d,.]+[MBK]?|\d+\.?\d*%', output)
    if len(numbers) < 3:
        return False, "Financial assessment lacks specific figures"
    return True, None

# Run the pipeline
result = sequential_pipeline(
    task="Evaluate Acme Corp for potential acquisition",
    agents=[researcher, financial_analyst, report_writer],
    gates={
        "researcher": validate_research,
        "financial_analyst": validate_financials,
    },
)

A few things to notice. The researcher uses a large model because it needs to reason about which searches to run and how to synthesize the results. The report writer uses a medium model because it is doing synthesis from already-structured input — a simpler task that does not justify the cost of the largest model. The financial analyst has specialized tools (financial APIs, SEC filings) that the other agents do not need and should not have access to. Each agent's tool selection problem is small and focused.

The gates are plain Python. validate_research checks that the research brief mentions certain required topics and meets a minimum length. validate_financials uses a regex to verify that the financial assessment contains actual numbers. These checks are fast, deterministic, and free — no model calls needed.

When to Use It #

The sequential pipeline fits when the task naturally decomposes into ordered stages where each stage depends on the previous one's output:

  • ETL-style processing: extract data, clean it, transform it, load it — each step requires the previous step's output
  • Generate-review-revise: a first agent drafts, a second agent reviews against criteria, and a third agent incorporates the feedback
  • Multi-domain analysis: research feeds into financial analysis, which feeds into risk assessment, which feeds into a final recommendation
  • Content pipelines: draft content, check compliance, translate — the same structure from prompt chaining but with full agents instead of single model calls

The key requirement is that the stages are sequential by nature — stage B genuinely needs stage A's output. If two stages could run independently, you are paying a latency penalty for no reason, and the parallel pattern is a better fit.

Trade-Offs #

Latency is additive. Each agent runs its full ReAct loop — potentially multiple model calls and tool invocations — before the next agent starts. A three-agent pipeline with agents that each make two or three internal tool calls can easily take 30 seconds or more. This is the price of depth: each agent gets full attention to its task, but the user waits for all of them in series.

Cascading errors are the primary risk. If the researcher produces a sloppy brief, the financial analyst works from bad data, and the report writer faithfully summarizes wrong conclusions. The pipeline amplifies errors at each stage because downstream agents trust their input. Validation gates between agents are the main defense — they catch structural problems before they propagate. But gates can only check form, not substance. A well-structured but factually wrong research brief will sail through a structural gate.

Rigidity is both strength and weakness. The pipeline always runs in the same order with the same agents. You can predict its behavior, monitor its cost, and test each agent in isolation. But if a new use case needs a different ordering or an extra agent, you build a new pipeline. This is fine when your task structure is stable; it becomes friction when it changes frequently.

The Parallel Pattern #

The parallel pattern fans a task out to multiple agents that run concurrently, then collects and aggregates their results. Instead of a single agent reviewing a document for everything at once, three specialized agents each check one dimension — accuracy, compliance, readability — at the same time. Wall-clock time is determined by the slowest agent, not the sum of all agents.

                        Task
                          │
        ┌─────────────────┼─────────────────┐
        ▼                 ▼                 ▼
  ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
  │  Agent A     │ │  Agent B     │ │  Agent C     │
  │  (accuracy)  │ │ (compliance) │ │(readability) │
  └──────┬───────┘ └──────┬───────┘ └──────┬───────┘
         │                │                │
         └────────────────┼────────────────┘
                          ▼
                   ┌─────────────┐
                   │  Aggregator │
                   └─────────────┘
                          │
                          ▼
                       Output

There are two variants, the same ones from single-agent parallelization: sectioning (different agents handle different aspects) and voting (multiple agents handle the same task for higher confidence). The difference here is that each branch is a full agent — with its own system prompt, tools, and ReAct loop instead of a single model call.

Sectioning #

Sectioning divides a task by aspect or domain. Each agent is a specialist that examines the input from one angle. The results are combined programmatically or by a synthesizer agent.

import asyncio

accuracy_agent = Agent(
    name="accuracy_checker",
    system_prompt="""You are a fact-checker. Review the document for factual
    accuracy. For each claim, verify it against your tools. Return a list of
    findings: each finding should state the claim, whether it is accurate,
    and the evidence.""",
    tools=[web_search, knowledge_base],
    model="large-model",
)

compliance_agent = Agent(
    name="compliance_checker",
    system_prompt="""You are a regulatory compliance reviewer. Check the
    document against financial disclosure regulations. Flag any statements
    that are non-compliant, misleading, or missing required disclosures.
    Return a compliance report with specific violations.""",
    tools=[regulation_database, compliance_rules],
    model="large-model",
)

readability_agent = Agent(
    name="readability_checker",
    system_prompt="""You are an editorial reviewer. Evaluate the document for
    clarity, tone, and readability. Flag jargon, overly complex sentences,
    and structural issues. Suggest specific improvements. Return an editorial
    report.""",
    tools=[],  # Pure analysis, no tools needed
    model="medium-model",
)


async def parallel_review(document):
    """Run three specialist agents concurrently and merge results."""
    results = await asyncio.gather(
        asyncio.to_thread(accuracy_agent.run, document),
        asyncio.to_thread(compliance_agent.run, document),
        asyncio.to_thread(readability_agent.run, document),
    )

    return {
        "accuracy": results[0],
        "compliance": results[1],
        "readability": results[2],
    }

Each agent brings different tools and different expertise. The accuracy checker has search tools and a knowledge base. The compliance checker has a regulation database. The readability checker needs no tools at all — it is doing pure language analysis, and a medium model handles that fine.

If these three aspects were handled by a single agent, that agent would need all the tools, a long multi-purpose system prompt, and enough context window to hold everything. Splitting into three agents means each one has a focused context and a small, relevant tool set.

Voting #

Voting runs the same task through multiple agents — possibly with different system prompts, different models, or different temperatures — and takes a majority or consensus result. This is useful when the task has a discrete answer (classification, yes/no, pass/fail) and the cost of a wrong answer is high.

async def vote_on_approval(document, n=3):
    """Run multiple review agents and take a majority vote."""
    agents = [
        Agent(
            name=f"reviewer_{i}",
            system_prompt=f"""You are reviewer {i}. Evaluate whether this
            document is ready for publication. Consider accuracy, clarity,
            and completeness. Respond with exactly APPROVED or REJECTED,
            followed by a one-sentence justification.""",
            tools=[],
            model="large-model",
        )
        for i in range(n)
    ]

    results = await asyncio.gather(
        *[asyncio.to_thread(agent.run, document) for agent in agents]
    )

    votes = [r.strip().split(maxsplit=1)[0].upper() for r in results]
    approvals = sum(1 for vote in votes if vote == "APPROVED")
    rejections = n - approvals

    return {
        "decision": "APPROVED" if approvals > n // 2 else "REJECTED",
        "votes": {"approved": approvals, "rejected": rejections},
        "details": results,
    }

Voting with full agents is expensive — each agent runs its own ReAct loop, potentially making tool calls. It is justified when the stakes are high (publishing a financial report, approving a legal document) and when independent evaluations genuinely catch different problems. Before committing to voting, check that the agents actually disagree sometimes. If they agree 99% of the time, you are paying three times the cost for almost no improvement in decision quality.

The Aggregation Problem #

The hardest part of the parallel pattern is not the fan-out — it is the fan-in. How you combine results from multiple agents depends on the task.

Structured concatenation works when agents produce independent reports that just need to be placed side by side. The review example above returns a dictionary with separate keys for each aspect. This is simple and preserves all information, but it does not integrate the findings.

Programmatic merging works for discrete outputs. Voting is the simplest version — count the results. For richer outputs, you might merge lists (combine all flagged issues from multiple reviewers), union sets (combine all extracted entities), or take intersections (only keep findings that multiple agents agree on).

Synthesizer agent handles cases where the combination requires judgment. A final agent reads all the parallel outputs and produces a unified summary, resolving conflicts and weighting the contributions. This adds another model call but produces a coherent, integrated result rather than a bag of separate reports.

synthesizer = Agent(
    name="synthesizer",
    system_prompt="""You receive reports from three specialist reviewers:
    an accuracy checker, a compliance checker, and a readability reviewer.
    Produce a single, unified review that:
    1. Lists all critical issues (accuracy or compliance) first
    2. Lists editorial suggestions second
    3. Gives an overall assessment: PASS, CONDITIONAL PASS, or FAIL
    Resolve any contradictions between reviewers by explaining both views.""",
    tools=[],
    model="large-model",
)

async def parallel_review_with_synthesis(document):
    reviews = await parallel_review(document)

    combined_input = (
        f"Accuracy Review:\n{reviews['accuracy']}\n\n"
        f"Compliance Review:\n{reviews['compliance']}\n\n"
        f"Readability Review:\n{reviews['readability']}"
    )

    return synthesizer.run(combined_input)

The synthesizer pattern is powerful but introduces a dependency: the quality of the final output depends on how well the synthesizer resolves conflicts between agents. If the accuracy checker says a claim is true and the compliance checker flags it as misleading, the synthesizer has to make a judgment call. This is where a well-crafted system prompt for the synthesizer matters — it needs clear rules for how to handle disagreements.

Trade-Offs #

Cost scales with fan-out. Three parallel agents cost three times as much as one. Each agent has its own system prompt, its own conversation context, and its own tool calls. If any of the parallel agents are expensive (large models, many tool calls), the cost adds up fast.

Latency equals the slowest branch. Wall-clock time is determined by whichever agent takes longest. If two agents finish in 5 seconds and one takes 20, the whole thing takes 20. Setting per-agent timeouts prevents a single straggler from blocking the pipeline, but you need a strategy for what to do with a timed-out agent's missing output — proceed without it, substitute a default, or fail the whole thing.

Shared context is duplicated. If all three agents need the same background document, that document gets loaded into three separate context windows. This triples the token cost for that shared context. Keeping the shared input as concise as possible reduces waste.

Combining Sequential and Parallel #

Real systems rarely use pure sequential or pure parallel. They combine both. The most common composite pattern is the diamond: a sequential stage produces output, that output fans out to parallel agents, the parallel results are collected, and another sequential stage processes the combined result.

        Task
          │
          ▼
    ┌───────────┐
    │  Stage 1  │         Sequential: preparation
    │ (prepare) │
    └─────┬─────┘
          │
   ┌──────┼──────┐
   ▼      ▼      ▼
┌────┐ ┌────┐ ┌────┐
│ A  │ │  B │ │  C │    Parallel: specialist analysis
└──┬─┘ └──┬─┘ └──┬─┘
   │      │      │
   └──────┼──────┘
          ▼
    ┌───────────┐
    │ Aggregate │         Sequential: combine results
    └─────┬─────┘
          │
          ▼
    ┌───────────┐
    │   Stage 2 │         Sequential: final processing
    │ (finalize)│
    └─────┬─────┘
          │
          ▼
       Output

A content pipeline might prepare a document (stage 1), run parallel checks for accuracy, compliance, and readability (parallel stage), aggregate the findings (aggregation), and produce a final revised version (stage 2). A customer feedback system might preprocess raw feedback (stage 1), fan out to sentiment, keyword extraction, and categorization agents in parallel, merge the results, and route to the appropriate handler based on the combined analysis.

async def diamond_pipeline(task):
    # Stage 1: Sequential preparation
    prepared = preparation_agent.run(task)

    gate_check(prepared, "preparation")

    # Stage 2: Parallel specialist analysis
    reviews = await asyncio.gather(
        asyncio.to_thread(agent_a.run, prepared),
        asyncio.to_thread(agent_b.run, prepared),
        asyncio.to_thread(agent_c.run, prepared),
    )

    # Stage 3: Aggregate
    combined = aggregation_agent.run(format_reviews(reviews))

    gate_check(combined, "aggregation")

    # Stage 4: Sequential finalization
    result = finalization_agent.run(combined)

    return result

The diamond is useful in multiple situations . You can chain multiple parallel stages (fan-out, fan-in, fan-out again). You can nest a sequential pipeline inside one branch of a parallel fan-out. You can run a sequential pipeline that conditionally fans out only when a gate detects the need for multiple perspectives. The building blocks are simple — sequence and parallel — and they compose freely.

The Routing Variant #

A common variation on the diamond replaces the parallel fan-out with conditional routing. Instead of sending the task to all agents, a classifier examines the input and routes it to one specialist.

       Task
        │
        ▼
  ┌────────────┐
  │ Classifier │
  └─────┬──────┘
        │
   ┌────┼────┐
   ▼    ▼    ▼
 ┌───┐┌───┐┌───┐
 │ A ││ B ││ C │
 └─┬─┘└─┬─┘└─┬─┘
   │    │    │
   └────┼────┘
        ▼
     Output

Only one branch executes. This is cheaper than full fan-out (one agent instead of three) but it puts the decision quality entirely on the classifier. A misclassification sends the task to the wrong specialist, and the specialist has no way to know it is working on something outside its expertise. Routing works best when the categories are clearly separable and the classifier is reliable — which usually means the categories were designed to be mutually exclusive and the classifier was tested on representative samples.

Validation Gates #

We touched on gates in the sequential pipeline, but they deserve a closer look. In a multi-agent system, validation gates are your primary defense against cascading errors. Every time one agent's output becomes another agent's input, a gate can check that the handoff is clean.

What Gates Check #

Gates operate on the form of the output, not the substance. They answer questions like:

  • Is the output valid JSON? If the next agent expects structured input, validate the schema.
  • Does it contain required sections? Check for expected headings, fields, or keywords.
  • Is it within length bounds? Reject outputs that are suspiciously short (probably incomplete) or unreasonably long (probably off track).
  • Does it contain expected data types? Numbers where numbers are expected, dates where dates are expected.
  • Does it pass domain-specific rules? A financial output should contain at least some dollar figures. A code generation output should parse as valid syntax.
def gate_structured_output(output, required_fields, min_length=50):
    """Generic gate for structured agent outputs."""
    errors = []

    if len(output.split()) < min_length:
        errors.append(f"Output too short ({len(output.split())} words)")

    for field in required_fields:
        if field.lower() not in output.lower():
            errors.append(f"Missing required field: {field}")

    if errors:
        return False, "; ".join(errors)
    return True, None


def gate_json_output(output):
    """Gate that validates JSON structure."""
    import json
    try:
        parsed = json.loads(output)
        return True, parsed
    except json.JSONDecodeError as e:
        return False, f"Invalid JSON: {e}"

When Gates Fail #

A failed gate means the pipeline cannot continue with the current output. There are several strategies:

Fail fast. Raise an error, log the agent's output and the gate failure reason, and return an error to the caller. This is the simplest approach and the right default. It prevents bad data from propagating and gives you a clean signal for debugging.

Retry the agent. Run the failed agent again, optionally with the gate's feedback appended to the original input. "Your previous output was rejected because it was missing the financial projections section. Please try again and include financial projections." This can work, but it adds latency and is not guaranteed to succeed — some failures are systematic, not random.

Fallback. Route to an alternative agent or a default response. This is useful in production systems where a degraded answer is better than no answer.

def run_with_retry(agent, message, gate_fn, max_retries=2):
    """Run an agent with gate validation and retry on failure."""
    for attempt in range(max_retries + 1):
        output = agent.run(message)
        passed, error = gate_fn(output)

        if passed:
            return output

        if attempt < max_retries:
            message = (
                f"{message}\n\n[RETRY: Your previous output was rejected. "
                f"Reason: {error}. Please address this and try again.]"
            )

    raise PipelineError(
        f"Agent {agent.name} failed after {max_retries + 1} attempts",
        last_output=output,
        last_error=error,
    )

The retry approach feeds the gate failure reason back to the agent as additional context. This works surprisingly well for structural failures — the agent did not include a required section, or produced output that does not parse. It works less well for substantive failures, because the agent may not have the information needed to fix the underlying problem.

Sequential vs Parallel Choice #

The choice is driven by dependency structure.

If stage B needs stage A's output, the relationship is sequential. You cannot parallelize dependencies. Research must happen before analysis because the analyst needs the research findings. Drafting must happen before review because the reviewer needs something to review.

If stages A and B operate independently on the same input, the relationship is parallel. Accuracy checking and compliance checking examine the same document but do not depend on each other. Sentiment analysis and keyword extraction work on the same text but produce independent outputs.

If some stages are dependent and some are independent, you combine both — typically the diamond pattern. Prepare the input (sequential), fan out to independent specialists (parallel), aggregate (sequential), finalize (sequential).

Here is a decision framework:

Can all sub-tasks run independently?
├── Yes → Parallel pattern (fan-out / fan-in)
├── No, they have strict ordering → Sequential pipeline
└── Mixed: some independent, some ordered
    └── Diamond pattern (sequential + parallel stages)

Is the task routing known at design time?
├── Yes → Hardcode the topology (sequential or parallel)
└── No, it depends on the input
    └── This is a coordinator problem (covered later)

One useful heuristic: draw the data flow. If it is a straight line, use sequential. If it fans out and converges, use parallel. If you cannot draw it without crossing lines or loops, you probably need a coordinator or a more dynamic pattern.

Practical Considerations #

Logging and Traceability #

Every inter-agent message should carry a correlation ID — a unique identifier that ties together all the agent calls for a single task. When something goes wrong in a five-agent diamond pipeline, you need to reconstruct the full message flow: what input each agent received, what it produced, which gates passed, and which failed.

import uuid

def run_pipeline_with_tracing(task, agents, gates=None):
    correlation_id = str(uuid.uuid4())
    trace = []
    current_input = task
    gates = gates or {}

    for agent in agents:
        entry = {
            "correlation_id": correlation_id,
            "agent": agent.name,
            "input_preview": current_input[:200],
        }

        output = agent.run(current_input)
        entry["output_preview"] = output[:200]

        gate_fn = gates.get(agent.name)
        if gate_fn:
            passed, error = gate_fn(output)
            entry["gate_passed"] = passed
            if not passed:
                entry["gate_error"] = error
                trace.append(entry)
                log_trace(trace)
                raise PipelineError(f"Gate failed: {error}", trace=trace)

        trace.append(entry)
        current_input = output

    log_trace(trace)
    return current_input

The trace captures just enough information to reconstruct what happened without storing the full content of every message (which could be enormous). The input_preview and output_preview fields give you enough to identify which agent went wrong, and you can retrieve the full content from your logging system when you need to dig in.

Cost Control #

Multi-agent pipelines multiply cost. A few practices keep it manageable:

Right-size models per agent. Not every agent needs the largest model. Agents doing straightforward synthesis or formatting can use smaller, cheaper models. Agents doing complex reasoning or working with specialized tools need larger ones. This is the model routing concept from the multi-agent foundations — applied per-node in the pipeline.

Keep context lean. Each agent should receive only the information it needs, not the full history of every previous agent. If the financial analyst only needs the research summary and not the full search results, trim the input. Passing bloated context wastes tokens and can degrade performance by burying the relevant information in noise.

Set token budgets. Cap the max_tokens for each agent's response to prevent runaway costs from agents that produce overly verbose output. This also serves as an implicit quality signal — if an agent needs 5,000 tokens to produce what should be a 500-word summary, something is probably wrong.

Testing #

Deterministic topologies are testable in a way that dynamic coordination is not. You can:

  • Unit test each agent in isolation. Give the research agent a known input and check that its output contains expected elements. Mock the tools and verify that the agent calls the right tools in the right order.
  • Test gates with known good and bad inputs. Generate outputs that should pass the gate and outputs that should fail. Verify the gate catches the failures and passes the good ones.
  • Integration test the full pipeline. Run the complete sequence with a representative input and check the final output. This is more expensive (real model calls, real tool invocations) but catches issues that only appear when agents interact — like one agent producing output in a format the next agent cannot parse.
  • Snapshot test. Record the intermediate outputs for a known input and compare future runs against the snapshot. This catches regressions when you change a system prompt or swap a model.

Conclusion #

Sequential and parallel patterns are the workhorses of multi-agent systems — simple, predictable, and debuggable. They cover a large fraction of real-world multi-agent use cases without introducing the complexity of dynamic orchestration.

Key takeaways:

  • Sequential pipelines pass output from one agent to the next in a fixed order — use them when each stage genuinely depends on the previous stage's output
  • Parallel patterns fan a task out to multiple agents running concurrently — use them when sub-tasks are independent and can execute simultaneously
  • The diamond pattern combines both: sequential preparation, parallel specialist analysis, aggregation, and sequential finalization
  • Validation gates between agents are your main defense against cascading errors — check form (structure, length, required fields) at every handoff
  • The developer controls the topology — no AI model decides which agent runs next — which makes these patterns the easiest to debug, test, and reason about
  • Right-size models per agent: use large models for complex reasoning, medium or small models for synthesis and formatting
  • Carry a correlation ID through every agent call so you can reconstruct the full message flow when debugging
  • If you cannot draw the data flow as a straight line or a fan-out/fan-in, you probably need a coordinator — and that is a different pattern