Agentic Search & Deep Research

Publish at:

A single query to a search engine returns ten blue links. A single call to a RAG pipeline returns a handful of chunks from a vector store. Both are one-shot — you ask, you get an answer, you move on. But real research does not work that way. A human researcher reads one paper, notices a citation, follows it, discovers a contradictory claim, searches for more evidence, revises their understanding, and repeats until they have enough confidence to draw a conclusion.

Agentic search brings this iterative, multi-source process to AI agents. A single retrieval is only the beginning. The agent plans a search strategy, executes queries across multiple sources, evaluates what it finds, identifies gaps and contradictions, refines its queries, and synthesizes a grounded answer with provenance tracking. This is the pattern behind "deep research" features — agents that spend minutes or hours investigating a topic before producing a comprehensive report.

From Single-Shot RAG to Iterative Research #

Standard RAG has a clean, simple architecture: embed the query, retrieve top-K chunks, stuff them into the prompt, generate an answer. It works well when the answer exists in a single document and the user's phrasing is close enough to the source text for semantic similarity to find it.

But it falls apart in several common scenarios:

Multi-hop questions. "What was the revenue impact of the product launched by the team that acquired Company X?" requires finding the acquisition, identifying the team, finding their product, and then locating revenue data — across different documents.

Exploratory research. "What are the arguments for and against approach X in distributed systems?" requires surveying multiple sources with diverse perspectives, not retrieving the single best match.

Verification tasks. "Is claim Y actually true?" requires finding independent sources that confirm or contradict, not just one source that states the claim.

Evolving understanding. Sometimes you do not know the right question until you have read the first few results. The initial query is a starting point.

Agentic search wraps retrieval in the standard agent loop — reason, act, observe — so the agent can search iteratively, refine its approach based on results, and build understanding over multiple cycles.

┌──────────────────────────────────────────────────────────────────┐
│                    Agentic Search Loop                           │
│                                                                  │
│  ┌──────────┐    ┌───────────┐    ┌──────────┐    ┌───────────┐  │
│  │  Plan    │───►│  Search   │───►│ Evaluate │───►│ Synthesize│  │
│  │  queries │    │  sources  │    │ results  │    │ or refine │  │
│  └──────────┘    └───────────┘    └──────────┘    └───────────┘  │
│       ▲                                                 │        │
│       │            ┌──────────────┐                     │        │
│       └────────────│  Gap/conflict│◄────────────────────┘        │
│                    │  detected    │                              │
│                    └──────────────┘                              │
│                                                                  │
│  Sources: web search, vector stores, APIs, databases, documents  │
└──────────────────────────────────────────────────────────────────┘

The difference from standard RAG is fundamental: the agent decides when it has enough information. It can search ten times or a hundred times. It can pivot to entirely new queries based on what it learns. It owns its own termination condition instead of being bounded by a fixed retrieval count.

The Research Agent Architecture #

A deep research agent has several components beyond the standard agent loop:

A query planner that decomposes a high-level research question into a set of sub-queries — initial facets to investigate.

A source router that directs each sub-query to the appropriate source: web search for recent information, a vector store for internal documents, a structured database for quantitative data, or a specialized API for domain-specific knowledge.

An evaluator that assesses each result for relevance, quality, and sufficiency — deciding whether the agent has enough information or needs to keep searching.

A synthesis engine that combines findings across sources and iterations into a coherent output with citations.

A provenance tracker that records where every piece of information came from, enabling citation and auditability.

import json
import time
from collections.abc import AsyncIterator
from dataclasses import dataclass, field
from enum import Enum


class SourceType(Enum):
    WEB_SEARCH = "web_search"
    VECTOR_STORE = "vector_store"
    DATABASE = "database"
    API = "api"
    DOCUMENT = "document"


@dataclass
class SearchResult:
    content: str
    source_type: SourceType
    source_id: str  # URL, document ID, table name, etc.
    relevance_score: float = 0.0
    timestamp: str = ""
    metadata: dict = field(default_factory=dict)


@dataclass
class ResearchState:
    """Tracks the evolving state of a research session."""
    original_question: str
    sub_queries: list[str] = field(default_factory=list)
    completed_queries: list[str] = field(default_factory=list)
    results: list[SearchResult] = field(default_factory=list)
    gaps: list[str] = field(default_factory=list)
    contradictions: list[dict] = field(default_factory=list)
    confidence: float = 0.0
    iteration: int = 0
    max_iterations: int = 10


class ResearchAgent:
    """An agent that performs iterative, multi-source research."""

    def __init__(self, model, sources: dict, max_iterations: int = 10):
        self.model = model
        self.sources = sources  # name -> source client
        self.max_iterations = max_iterations

    async def research(self, question: str) -> dict:
        state = ResearchState(
            original_question=question,
            max_iterations=self.max_iterations,
        )

        # Phase 1: Plan initial sub-queries
        state.sub_queries = await self._plan_queries(question)

        # Phase 2: Iterative search loop
        while not self._should_stop(state):
            state.iteration += 1

            # Pick the next sub-query to investigate
            query = self._select_next_query(state)

            # Route to appropriate source(s)
            sources = await self._route_query(query)

            # Execute search across selected sources
            results = await self._execute_search(query, sources)

            # Evaluate results
            evaluation = await self._evaluate_results(results, state)

            # Update state based on evaluation
            state.results.extend(evaluation.relevant_results)
            state.completed_queries.append(query)
            state.gaps.extend(evaluation.new_gaps)
            state.contradictions.extend(evaluation.contradictions)
            state.confidence = evaluation.overall_confidence

            # Generate follow-up queries if needed
            if evaluation.new_gaps:
                new_queries = await self._refine_queries(state)
                state.sub_queries.extend(new_queries)

        # Phase 3: Synthesize final answer
        return await self._synthesize(state)

    def _should_stop(self, state: ResearchState) -> bool:
        if state.iteration >= state.max_iterations:
            return True
        if state.confidence >= 0.9:
            return True
        if not state.sub_queries and not state.gaps:
            return True  # Nothing left to investigate
        return False

    def _select_next_query(self, state: ResearchState) -> str:
        # Prioritize gaps over planned sub-queries
        if state.gaps:
            return state.gaps.pop(0)
        return state.sub_queries.pop(0)

Iterative Query Refinement #

The most distinctive behavior of a research agent is how it refines its search strategy based on intermediate results. It is a directed search informed by what has been found so far.

Three refinement strategies dominate:

Query expansion. The agent finds a result that uses terminology it had not considered and generates new queries using that vocabulary. If you search for "database sharding" and find a relevant paper that uses "horizontal partitioning," you add queries with the new term.

Query narrowing. Initial results are too broad or noisy. The agent adds constraints — date ranges, specific authors, particular sub-topics — to focus the search.

Lateral pivoting. The agent discovers that the answer lies in a different domain than expected. It abandons the current search direction and opens a new line of inquiry.

class QueryRefiner:
    """Refine search queries based on intermediate results."""

    def __init__(self, model):
        self.model = model

    async def refine(
        self,
        original_query: str,
        results_so_far: list[SearchResult],
        gaps: list[str],
        contradictions: list[dict],
    ) -> list[str]:
        prompt = f"""You are refining search queries for a research task.

Original question: {original_query}

Results found so far (summarized):
{self._summarize_results(results_so_far)}

Information gaps still remaining:
{self._format_gaps(gaps)}

Contradictions found:
{self._format_contradictions(contradictions)}

Generate 2-4 new search queries that would help:
1. Fill the identified gaps
2. Resolve contradictions by finding authoritative sources
3. Use alternative terminology discovered in results
4. Narrow broad results to the specific context needed

Return only the queries, one per line."""

        response = await self.model.generate(prompt)
        return [q.strip() for q in response.strip().split("\n") if q.strip()]

    async def decompose_multi_hop(
        self, question: str, partial_answers: dict
    ) -> list[str]:
        """Break a multi-hop question into sequential sub-queries,
        incorporating answers already found."""
        prompt = f"""Break this complex question into sequential search queries.
Each query should build on previous answers.

Question: {question}
Already known: {partial_answers}

Generate the NEXT queries needed to complete the answer chain."""

        response = await self.model.generate(prompt)
        return [q.strip() for q in response.strip().split("\n") if q.strip()]

    def _summarize_results(self, results: list[SearchResult]) -> str:
        summaries = []
        for r in results[:20]:  # Cap to avoid prompt overflow
            summaries.append(
                f"- [{r.source_type.value}] {r.content[:200]}..."
            )
        return "\n".join(summaries)

The key insight: query refinement is itself a reasoning task. The agent must understand what it knows, what it does not know, and what queries would close that gap. This is why agentic search requires a capable reasoning model — a weaker model cannot reliably assess its own knowledge gaps.

Multi-Source Routing #

Real research draws from heterogeneous sources. A financial analysis might combine SEC filings (document store), recent news (web search), stock price data (structured API), and analyst reports (vector store). The agent must decide which sources to consult for each sub-query.

class SourceRouter:
    """Route queries to the most appropriate information sources."""

    def __init__(self, model, source_registry: dict):
        self.model = model
        self.source_registry = source_registry

    async def route(self, query: str, context: ResearchState) -> list[str]:
        """Determine which sources to query for a given sub-query."""
        source_descriptions = "\n".join(
            f"- {name}: {src.description}"
            for name, src in self.source_registry.items()
        )

        prompt = f"""Given this search query and available sources, select
which sources to consult. Consider freshness needs, data type, and coverage.

Query: {query}
Research context: {context.original_question}

Available sources:
{source_descriptions}

Already consulted for this topic: {context.completed_queries[-3:]}

Return source names, comma-separated. Prefer fewer sources unless breadth
is specifically needed."""

        response = await self.model.generate(prompt)
        selected = [s.strip() for s in response.split(",")]
        return [s for s in selected if s in self.source_registry]

    async def execute_parallel(
        self, query: str, sources: list[str]
    ) -> list[SearchResult]:
        """Query multiple sources in parallel."""
        import asyncio

        tasks = [
            self.source_registry[source].search(query)
            for source in sources
            if source in self.source_registry
        ]
        results_nested = await asyncio.gather(*tasks, return_exceptions=True)

        all_results = []
        for result_set in results_nested:
            if isinstance(result_set, Exception):
                continue  # Log and skip failed sources
            all_results.extend(result_set)

        return all_results

Source routing decisions often follow predictable patterns:

Query Type Primary Source Fallback Source
Recent events Web search News API
Internal knowledge Vector store Document store
Quantitative data Structured database Spreadsheet API
Academic/research Paper database Web search with site filters
Definitions, concepts Knowledge base Web search
Historical context Archive/document store Web search with date filters

The router does not need to be perfect on every query — the iterative nature of the loop means the agent can retry with a different source if the first choice yields poor results.

Source Triangulation #

Finding an answer in one source is not enough for high-stakes research. Triangulation means verifying a claim by finding independent sources that corroborate it. This is how the agent builds confidence.

@dataclass
class Claim:
    statement: str
    supporting_sources: list[str] = field(default_factory=list)
    contradicting_sources: list[str] = field(default_factory=list)
    confidence: float = 0.0

    def update_confidence(self):
        support = len(self.supporting_sources)
        contra = len(self.contradicting_sources)
        if support + contra == 0:
            self.confidence = 0.0
        else:
            # Weight by source independence and quality
            self.confidence = support / (support + contra * 2)


class SourceTriangulator:
    """Verify claims by cross-referencing independent sources."""

    def __init__(self, model):
        self.model = model

    async def extract_claims(
        self, results: list[SearchResult]
    ) -> list[Claim]:
        """Extract verifiable factual claims from search results."""
        prompt = f"""Extract distinct factual claims from these search results.
Each claim should be a single, verifiable statement.

Results:
{self._format_results(results)}

Return claims as a JSON array of objects with "statement" and "source_id"."""

        response = await self.model.generate(prompt)
        raw_claims = json.loads(response)

        claims = {}
        for item in raw_claims:
            stmt = item["statement"]
            if stmt not in claims:
                claims[stmt] = Claim(statement=stmt)
            claims[stmt].supporting_sources.append(item["source_id"])

        return list(claims.values())

    async def check_agreement(
        self, claim: Claim, new_result: SearchResult
    ) -> str:
        """Check if a new result supports or contradicts a claim."""
        prompt = f"""Does this text support, contradict, or say nothing about
the following claim?

Claim: {claim.statement}
Text: {new_result.content}

Answer exactly one of: SUPPORTS, CONTRADICTS, NEUTRAL"""

        response = await self.model.generate(prompt)
        verdict = response.strip().upper()

        if verdict == "SUPPORTS":
            claim.supporting_sources.append(new_result.source_id)
        elif verdict == "CONTRADICTS":
            claim.contradicting_sources.append(new_result.source_id)

        claim.update_confidence()
        return verdict

    async def identify_conflicts(
        self, claims: list[Claim]
    ) -> list[dict]:
        """Find claims that conflict with each other."""
        conflicts = []
        for i, claim_a in enumerate(claims):
            for claim_b in claims[i + 1:]:
                if await self._are_contradictory(claim_a, claim_b):
                    conflicts.append({
                        "claim_a": claim_a.statement,
                        "claim_b": claim_b.statement,
                        "sources_a": claim_a.supporting_sources,
                        "sources_b": claim_b.supporting_sources,
                    })
        return conflicts

Triangulation provides two things: confidence calibration (how sure should the agent be about a claim?) and conflict detection (where do sources disagree, and what should the agent do about it?).

When conflicts are detected, the agent has several strategies:

  • Authority ranking. Prefer primary sources over secondary, recent over stale, peer-reviewed over informal.
  • Majority consensus. If three sources agree and one disagrees, weight the majority — but flag the disagreement.
  • Surfacing uncertainty. Report the disagreement to the user rather than picking a side silently.
  • Targeted investigation. Search specifically for evidence that would resolve the conflict.

Citation Graphs and Provenance #

Every fact in the final output must trace back to a source. This is not optional — without provenance, a research agent produces plausible-sounding text with no way to verify it. Citation graphs track the lineage of every claim.

@dataclass
class CitationNode:
    """A single citable unit of information."""
    id: str
    content: str
    source_id: str
    source_url: str = ""
    source_title: str = ""
    retrieved_at: str = ""
    relevance_score: float = 0.0


@dataclass
class CitationEdge:
    """A relationship between citations."""
    from_id: str
    to_id: str
    relationship: str  # "supports", "contradicts", "extends", "cites"


class CitationGraph:
    """Track provenance and relationships between sources."""

    def __init__(self):
        self.nodes: dict[str, CitationNode] = {}
        self.edges: list[CitationEdge] = []
        self._claim_to_citations: dict[str, list[str]] = {}

    def add_source(self, result: SearchResult) -> str:
        """Register a search result as a citable source."""
        node_id = self._generate_id(result)
        self.nodes[node_id] = CitationNode(
            id=node_id,
            content=result.content,
            source_id=result.source_id,
            source_url=result.metadata.get("url", ""),
            source_title=result.metadata.get("title", ""),
            retrieved_at=result.timestamp,
            relevance_score=result.relevance_score,
        )
        return node_id

    def link_claim_to_source(self, claim: str, citation_id: str):
        """Record that a claim is supported by a specific citation."""
        if claim not in self._claim_to_citations:
            self._claim_to_citations[claim] = []
        self._claim_to_citations[claim].append(citation_id)

    def add_relationship(
        self, from_id: str, to_id: str, relationship: str
    ):
        """Record a relationship between two sources."""
        self.edges.append(CitationEdge(
            from_id=from_id,
            to_id=to_id,
            relationship=relationship,
        ))

    def get_citations_for_claim(self, claim: str) -> list[CitationNode]:
        """Get all citations supporting a specific claim."""
        citation_ids = self._claim_to_citations.get(claim, [])
        return [self.nodes[cid] for cid in citation_ids if cid in self.nodes]

    def generate_bibliography(self) -> list[dict]:
        """Generate a formatted bibliography from the citation graph."""
        unique_sources = {}
        for node in self.nodes.values():
            if node.source_id not in unique_sources:
                unique_sources[node.source_id] = {
                    "id": node.source_id,
                    "title": node.source_title,
                    "url": node.source_url,
                    "retrieved_at": node.retrieved_at,
                    "times_cited": 0,
                }
            unique_sources[node.source_id]["times_cited"] += 1

        return sorted(
            unique_sources.values(),
            key=lambda x: x["times_cited"],
            reverse=True,
        )

    def export_for_synthesis(self) -> dict:
        """Export the graph in a format suitable for the synthesis step."""
        return {
            "sources": [
                {
                    "id": n.id,
                    "content": n.content,
                    "url": n.source_url,
                    "title": n.source_title,
                }
                for n in self.nodes.values()
            ],
            "relationships": [
                {
                    "from": e.from_id,
                    "to": e.to_id,
                    "type": e.relationship,
                }
                for e in self.edges
            ],
            "claim_support": self._claim_to_citations,
        }

The citation graph enables several important capabilities:

Inline citations. The synthesis step can insert references ("According to [1][2]...") because it knows which sources support each statement.

Confidence scores. Claims backed by multiple independent sources get higher confidence than single-source claims.

Source quality analysis. If one source is contradicted by many others, it may be unreliable — the graph makes this visible.

Audit trails. A human reviewer can trace any statement in the output back to its original source, verifying the agent did not hallucinate.

The Synthesis Step #

After the iterative search loop terminates, the agent must synthesize its findings into a coherent output. This is not simple summarization — it is structured argumentation with citations.

class ResearchSynthesizer:
    """Synthesize research findings into a cited, structured output."""

    def __init__(self, model):
        self.model = model

    async def synthesize(
        self, state: ResearchState, citation_graph: CitationGraph
    ) -> dict:
        """Produce a final research output with citations."""
        graph_data = citation_graph.export_for_synthesis()

        prompt = f"""You are synthesizing research findings into a comprehensive
answer. Every factual claim must cite its source(s) using [source_id] notation.

Original question: {state.original_question}

Sources and their content:
{self._format_sources(graph_data["sources"])}

Relationships between sources:
{self._format_relationships(graph_data["relationships"])}

Contradictions found during research:
{self._format_contradictions(state.contradictions)}

Confidence level: {state.confidence:.0%}
Iterations completed: {state.iteration}
Remaining gaps: {state.gaps}

Instructions:
- Structure the answer with clear sections
- Cite every factual claim with [source_id]
- Where sources conflict, present both sides and note the disagreement
- Flag any remaining uncertainty or gaps
- End with a confidence assessment"""

        response = await self.model.generate(prompt)

        return {
            "answer": response,
            "bibliography": citation_graph.generate_bibliography(),
            "confidence": state.confidence,
            "sources_consulted": len(citation_graph.nodes),
            "iterations": state.iteration,
            "unresolved_gaps": state.gaps,
            "conflicts": state.contradictions,
        }

    async def synthesize_progressive(
        self, state: ResearchState, citation_graph: CitationGraph
    ) -> AsyncIterator[str]:
        """Stream a synthesis as it is generated — useful for long reports."""
        # For long research tasks, users want to see progress
        # Generate section by section
        outline = await self._generate_outline(state)

        for section in outline:
            relevant_sources = self._sources_for_section(
                section, citation_graph
            )
            section_text = await self._write_section(
                section, relevant_sources, state
            )
            yield section_text

Depth Control and Budget Management #

A research agent without constraints will search forever. Practical systems need explicit controls on depth, breadth, and cost.

@dataclass
class ResearchBudget:
    """Controls for research depth and cost."""
    max_iterations: int = 10
    max_sources: int = 50
    max_tokens_spent: int = 500_000  # Total LLM tokens across all calls
    max_wall_time_seconds: int = 300
    min_confidence: float = 0.7  # Stop early if confidence is high
    max_queries_per_source: int = 5

    # Cost tracking
    tokens_used: int = 0
    queries_executed: int = 0
    start_time: float = 0.0

    def is_exhausted(self) -> bool:
        import time
        if self.tokens_used >= self.max_tokens_spent:
            return True
        if self.queries_executed >= self.max_iterations * 3:
            return True
        if time.time() - self.start_time >= self.max_wall_time_seconds:
            return True
        return False

    def remaining_budget_fraction(self) -> float:
        import time

        token_frac = 1 - (self.tokens_used / self.max_tokens_spent)
        time_frac = 1 - (
            (time.time() - self.start_time) / self.max_wall_time_seconds
        )
        return min(token_frac, time_frac)

Budget management creates an important design tension: thoroughness vs. responsiveness. A deep research agent that spends five minutes investigating is more thorough but may frustrate users expecting quick answers. The solution is tiered depth:

  • Quick answer (1-3 iterations): For simple factual queries. Search once, verify quickly, respond.
  • Standard research (5-10 iterations): For moderately complex questions. Multiple sources, basic triangulation.
  • Deep research (10-50+ iterations): For complex, multi-faceted topics. Exhaustive search, full triangulation, structured report output.

The agent can either be configured for a specific tier or adaptively escalate — starting shallow and going deeper only when initial results are insufficient.

Handling Stale and Contradictory Information #

Real-world information has a freshness dimension. A research agent must reason about temporal validity — a statistic from 2019 may be irrelevant in 2026. Contradictions between sources are often temporal: both claims were true, just at different times.

class TemporalReasoner:
    """Handle time-sensitive information in research."""

    def __init__(self, model, current_date: str):
        self.model = model
        self.current_date = current_date

    async def assess_freshness(
        self, result: SearchResult, query_context: str
    ) -> dict:
        """Determine if a result is still temporally valid."""
        prompt = f"""Assess whether this information is likely still current.

Information: {result.content[:500]}
Source date: {result.timestamp or "unknown"}
Current date: {self.current_date}
Research context: {query_context}

Consider:
- Does this contain time-sensitive data (prices, statistics, versions)?
- Has the domain changed significantly since publication?
- Is this a stable fact or rapidly evolving area?

Return JSON: {{"still_valid": bool, "confidence": float, "reason": str}}"""

        response = await self.model.generate(prompt)
        return json.loads(response)

    async def resolve_temporal_conflict(
        self, claim_a: dict, claim_b: dict
    ) -> dict:
        """Resolve a contradiction that may be temporal."""
        prompt = f"""Two sources make conflicting claims. Determine if this is
a temporal difference (both were true at different times) or a genuine
factual disagreement.

Claim A: {claim_a["statement"]} (from {claim_a["date"]})
Claim B: {claim_b["statement"]} (from {claim_b["date"]})

Return JSON: {{
  "is_temporal": bool,
  "current_answer": str,
  "explanation": str
}}"""

        response = await self.model.generate(prompt)
        return json.loads(response)

Putting It Together - A Complete Research Session #

Here is how the components interact during a complete research session. The user asks: "What are the current best practices for database connection pooling in high-throughput microservices?"

┌────────────────────────────────────────────────────────────────┐
│  Research Session: Database Connection Pooling                 │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│  Iteration 1: Plan                                             │
│  ├─ Sub-queries generated:                                     │
│  │   • "connection pool sizing best practices"                 │
│  │   • "microservices database pooling patterns"               │
│  │   • "connection pool exhaustion prevention"                 │
│  │   • "pgbouncer vs application-level pooling"                │
│  │                                                             │
│  Iteration 2: Search + Evaluate                                │
│  ├─ Query: "connection pool sizing best practices"             │
│  ├─ Sources: web_search, vector_store                          │
│  ├─ Results: 8 documents retrieved                             │
│  ├─ Relevant: 5 (scored > 0.7)                                 │
│  ├─ Key finding: "pool size = CPU cores * 2 + disk spindles"   │
│  └─ Gap detected: "modern NVMe changes this formula"           │
│                                                                │
│  Iteration 3: Refine + Search                                  │
│  ├─ New query: "connection pool sizing NVMe SSD"               │
│  ├─ New query: "HikariCP pool size recommendations 2024"       │
│  ├─ Sources: web_search                                        │
│  ├─ Results: 6 documents retrieved                             │
│  ├─ Contradiction found:                                       │
│  │   Source A: "keep pool small, 10-20 connections"            │
│  │   Source B: "scale pool with request concurrency"           │
│  └─ Action: search for authoritative resolution                │
│                                                                │
│  Iteration 4: Triangulate                                      │
│  ├─ Query: "why small connection pools outperform large ones"  │
│  ├─ Found: 3 sources confirming small-pool approach            │
│  ├─ Found: 1 source (dated 2018) recommending large pools      │
│  ├─ Temporal resolution: large-pool advice is outdated         │
│  └─ Confidence: 0.82                                           │
│                                                                │
│  Iteration 5: Fill remaining gaps                              │
│  ├─ Query: "connection pool per-service vs shared proxy"       │
│  ├─ Results: clear consensus on per-service pools + proxy      │
│  └─ Confidence: 0.91 → STOP                                    │
│                                                                │
│  Synthesis: Produce cited report with 14 sources               │
└────────────────────────────────────────────────────────────────┘

Five iterations. Fourteen sources consulted. One contradiction resolved temporally. A final report with inline citations and a bibliography. This is the pattern.

Trade-Offs in Agentic Search Design #

Dimension Shallow (1-2 iterations) Deep (10+ iterations)
Latency Seconds Minutes to hours
Cost Low (few LLM calls) High (many LLM + search calls)
Coverage May miss important sources Comprehensive
Confidence Low — single-source High — triangulated
User experience Feels like chat Feels like hiring a researcher
Failure mode Incomplete answer Over-researched, slow

Other design tensions:

Breadth vs. depth. Should the agent investigate many sub-topics shallowly, or fewer sub-topics deeply? The answer depends on the task — a survey needs breadth, a fact-check needs depth.

Exploration vs. exploitation. Should the agent keep searching new sources (exploration) or dive deeper into sources that already look promising (exploitation)? An explore-exploit balance similar to multi-armed bandits applies here.

Precision vs. recall. Should the agent prioritize finding exactly the right information (precision) or casting a wide net to ensure nothing is missed (recall)? High-stakes research needs recall; time-constrained answers need precision.

Transparency vs. speed. Should the agent show the user its intermediate reasoning and search steps (transparency), or hide the process and only show the final output (speed)? Progressive disclosure — showing a brief status while working, full trace on demand — is the pragmatic middle ground.

Conclusion #

Agentic search transforms retrieval from a single-shot operation into an iterative reasoning process. The agent plans queries, evaluates results, identifies gaps and contradictions, refines its approach, triangulates claims across independent sources, and synthesizes findings with full provenance tracking.

The key architectural components are a query planner that decomposes research questions, a source router that directs queries to appropriate backends, an evaluator that assesses sufficiency and detects conflicts, a triangulation system that builds confidence through corroboration, and a citation graph that maintains audit trails from every claim back to its source.

The fundamental design decision is depth control — how many iterations, how many sources, how much time and cost to invest before declaring the research complete. Getting this right means matching research depth to the stakes of the question: quick factual lookups warrant one or two iterations; complex multi-faceted investigations warrant dozens.