The ReAct pattern

Publish at:

A language model, on its own, does one thing: it takes text in and produces text out. One call, one response, no memory of what came before. That is powerful for answering questions, but it is useless for getting things done in the real world — where tasks take multiple steps, information is scattered across systems, and the right path forward only becomes clear once you start.

The pattern that bridges this gap is called ReAct, short for Reason + Act. It wraps the model in a loop that alternates between thinking and doing: the model reasons about the situation, takes an action through a tool, reads the result, and feeds that result back into the next round of reasoning. This is the most widely adopted approach for turning a stateless model into a goal-directed agent — and understanding it is essential before we get into any of the more complex patterns that build on top of it.

The Core Idea #

ReAct comes from a 2022 research paper that proposed a simple but powerful structure: instead of letting the model just generate text, force it through a repeating cycle of three phases.

  • Thought — the model reasons about the current state. What do I know? What do I still need? What should I do next?
  • Action — based on that reasoning, the model picks a tool and calls it with specific arguments.
  • Observation — the tool returns a result, and the model reads it. This new information feeds into the next thought.

The cycle repeats until the model decides it has enough information to produce a final answer — or until a stopping condition kicks in.

    User Goal
        │
        ▼
  ┌───────────┐
  │  Thought  │◀─────────────────┐
  │ "What do  │                  │
  │  I know?" │                  │
  └─────┬─────┘                  │
        │                        │
        ▼                        │
  ┌───────────┐                  │
  │  Action   │           ┌──────┴──────┐
  │ call tool │           │ Observation │
  │ with args │           │ read result │
  └─────┬─────┘           └──────▲──────┘
        │                        │
        ▼                        │
   Tool executes ────────────────┘

        │ (when done)
        ▼
  ┌───────────┐
  │  Final    │
  │  Answer   │
  └───────────┘

In code, ReAct is just a loop with three moving parts: state, tools, and a stop rule.

MAX_STEPS = 8

state = {
    "goal": user_goal,
    "messages": [],
    "trace": [],
}

for _ in range(MAX_STEPS):
    prompt = build_prompt(goal=state["goal"], messages=state["messages"], tools=TOOLS)
    out = call_model(prompt, tools=TOOLS)

    if out.type == "final":
        return out.text

    validate(out.tool, out.arguments)
    observation = run_tool(out.tool, out.arguments, timeout_s=10)

    state["trace"].append({"tool": out.tool, "args": out.arguments, "obs": observation})
    state["messages"].append(format_observation(observation))

raise RuntimeError("Stopped: step limit reached")

ReAct does not require you to display every intermediate thought to the user. You can keep an internal scratchpad and still keep a clean user-facing trace: actions plus observations.

ReAct only works if tools are well-defined — as we will see when we look at tool schemas and contracts in detail. What matters here is what happens at the boundary between the model and the tool inside the loop.

When the model decides to act, it returns a structured tool call. The runtime validates the arguments, dispatches the call, and feeds the result back as the next observation. If the call fails — a timeout, a permission error, a malformed response — the runtime does not retry silently. It wraps the failure as an observation and hands it back to the model:

try:
    result = run_tool(out.tool, out.arguments, timeout_s=10)
except ToolError as e:
    result = f"Tool '{out.tool}' failed: {e}"

state["messages"].append(format_observation(result))

The model sees the error, rethinks, and picks a different path — maybe a fallback tool, maybe a rephrased query, maybe a final answer that acknowledges the gap. This is what makes ReAct adaptive rather than brittle: failures are data, not crashes.

To see this in action, imagine an agent that answers the question: "What was the closing stock price of ACME Corp yesterday, and how does it compare to last week?"

Here is what the ReAct loop might look like:

Thought 1: I need yesterday's closing price. I will call the stock price API.

Action 1: get_stock_price(ticker="ACME", date="2026-04-14")

Observation 1: $142.50

Thought 2: Got yesterday's price. Now I need last week's closing price — that is April 7th.

Action 2: get_stock_price(ticker="ACME", date="2026-04-07")

Observation 2: $138.20

Thought 3: I have both numbers. Yesterday was $142.50, last week was $138.20. That is a $4.30 increase, about 3.1%. I can answer now.

Final Answer: "ACME Corp closed at $142.50 yesterday, up $4.30 (3.1%) from $138.20 last week."

Two tool calls, three thoughts, one answer. The model did not know upfront how many steps it would need — it figured that out as it went.

Effectiveness #

Three properties make this pattern effective:

Grounding in real data. Instead of guessing or hallucinating, the model fetches actual information through tools before answering. Each observation anchors the agent in reality.

Traceability. ReAct naturally produces an execution trace: which tools were called, with what inputs, and what came back. When something goes wrong, you can inspect the trace and see where the run went off the rails. You can keep internal reasoning private and still get most of the debugging value.

Flexibility. The model is not following a fixed script. If the first tool call fails or returns something unexpected, the next step can adjust the plan. This makes ReAct surprisingly resilient to messy, real-world conditions.

The Trade-offs #

ReAct is not free. Every thought-action-observation cycle costs at least one model call, and those add up.

Latency. Each step is sequential — the model must wait for the tool to return before it can decide the next step. An agent that needs five tool calls will take at least five round trips. There is no way to parallelize within a single ReAct loop.

Cost. Every cycle burns tokens. You pay for the model output, and you pay again for the growing prompt that repeats context back to the model.

Context window pressure. The trace accumulates in the model's context window. For complex tasks with many steps, this can exceed the model's token limit. At that point, you need to summarize, truncate, or move trace data out of the prompt and keep only the pieces needed for the next step.

Brittleness on complex tasks. ReAct is reactive — it decides one step at a time. For tasks that require coordinating many sub-goals or making decisions that depend on future steps, this one-at-a-time approach can lead to inefficient paths or dead ends. Planning-based patterns address this by adding a higher-level plan on top of the loop.

Cascading errors. Because each observation feeds into the next thought, a bad result early in the loop can poison every step that follows. If a tool returns stale data or a malformed response at step two, the model builds all subsequent reasoning on that faulty foundation — and the final answer inherits the error. The longer the chain, the higher the risk. Mitigations include validating tool outputs before feeding them back, adding sanity checks between steps, and keeping chains short.

Most real ReAct systems add a little engineering around the loop. They set step limits and tool-call budgets. They cache tool results when calls are expensive and repeatable. They summarize older trace entries to keep prompts small. They add quality checks between steps, especially before doing anything with side effects.

Tool calls fail. Networks time out. APIs return partial data. File operations hit permissions. A robust ReAct runtime treats these as first-class outcomes, not exceptions to ignore. It records failures in the trace, asks for clarification when needed, and tries alternatives only when it is safe to do so. Just as importantly, it treats tool output as untrusted input. If a tool returns text that looks like instructions, the agent should treat it as data and follow its own rules, not the tool's.

When to Use ReAct #

Chain-of-thought (CoT) prompting also makes the model "think step by step," so the two are easy to confuse. The difference is action.

Chain-of-thought is pure reasoning — the model thinks through a problem in natural language, but it never calls a tool or interacts with the outside world. It is useful for math, logic, and analysis tasks where all the information is already in the prompt.

ReAct adds the act-and-observe cycle on top. The model reasons and takes actions, then uses the results of those actions to inform further reasoning. This makes ReAct suitable for tasks that require external data or real-world interaction — which covers most agent use cases.

In practice, many agents combine both: careful step-by-step reasoning inside each Thought step, followed by a tool call. You do not need to show that internal reasoning to users for it to be useful.

ReAct is the right starting point when:

  • The task requires calling external tools or APIs
  • You do not know the exact steps upfront — the model needs to figure them out
  • You want a visible reasoning trace for debugging
  • The task is relatively focused — a handful of steps, not dozens

ReAct starts to struggle when:

  • The task requires many parallel operations (ReAct is inherently sequential)
  • The problem needs long-horizon planning with interdependent sub-goals
  • The context window fills up before the agent finishes
  • Latency is critical and you cannot afford multiple round trips

For those cases, other patterns — plan-and-execute, parallelization, multi-agent coordination — offer better alternatives.

Conclusion #

ReAct is the workhorse pattern of agentic AI. It gives a model the ability to reason about a goal, take action through tools, learn from the results, and keep going until the task is done. Its simplicity is its strength — and also its limitation.

Key takeaways:

  • ReAct cycles through three phases: thought, action, observation — repeating until the goal is met
  • Each cycle costs a model call, so latency and token cost grow linearly with the number of steps
  • The visible reasoning trace makes debugging straightforward
  • ReAct is reactive, not predictive — it works one step at a time without a long-term plan
  • For tasks that need parallelism, planning, or dozens of steps, other patterns may be a better fit