The building blocks
Ask five people what an "AI agent" is, and you will get five different answers. Strip away the marketing, though, and every agent comes down to the same handful of parts working together. A chatbot follows a script. A generative AI assistant is smarter — it uses a large language model (LLM) to write free-form answers — but it still works in a single turn: you ask, it responds, done.
An agent does something fundamentally different. It takes a goal, breaks it into steps, calls external tools to gather information or make changes, checks the results, and keeps going until the job is finished. The difference boils down to one thing: the loop. An assistant answers a question; an agent works toward an outcome.
That does not mean you should reach for an agent every time. If the task can be handled in a single model call — summarizing a document, translating text, classifying a support ticket — an agent adds latency, cost, and failure modes for no benefit. The right starting point is always the simplest thing that works: a well-crafted prompt, maybe with retrieval for context. You add the loop, the tools, and the orchestration only when the task genuinely requires multiple steps, external data, or real-world actions. Every component described below is overhead until it earns its place.
The Five Core Components #
If you look at agent architectures, the same five pieces keep showing up:
| Component | What it does |
|---|---|
| Model | The brain — an LLM that reads the goal, figures out the next step, and picks which tool to use |
| Tools | The hands — functions, APIs, and services the agent calls to actually do things |
| Memory | The notebook — conversation history for the current session, plus long-lived knowledge across sessions |
| Orchestration | The control loop — reason, act, observe, repeat |
| System Prompt | The job description — rules, persona, and constraints that shape how the agent behaves |
Here is how these pieces fit together:
┌─────────────────────────────────────────────┐
│ System Prompt │
│ (persona, constraints, operational rules) │
└─────────────────┬───────────────────────────┘
│
▼
┌────────────────────┐
┌───▶ │ Model │◀────┐
│ │ (reasoning engine)│ │
│ └────────┬───────────┘ │
│ │ │
│ decide next step │
│ │ │
│ ▼ │
│ ┌────────────────┐ ┌────────────┐
│ │ Tools │ │ Memory │
│ │ (APIs, search, │ │ (short & │
│ │ code, files) │ │ long-term)│
│ └────────┬───────┘ └─────┬──────┘
│ │ │
│ ▼ │
│ observe result ───────────┘
│ │
│ ┌────────────────────┐
└────│ Orchestration │
│ (reason → act → │
│ observe → repeat) │
└────────────────────┘
None of these are optional add-ons. Take one away and you no longer have an agent — you have a chatbot with extra steps.
The Model — The Brain #
The model is where the decisions happen. It reads the user's goal, the conversation so far, and whatever the last tool call returned, then decides what to do next. Notice what the model does not do — it does not run queries, fetch web pages, or write files. It decides; other components execute. That separation means you can swap models, adjust tool sets, or change the orchestration logic independently.
Most agent runtimes treat the model output as one of two shapes. Either the model returns a final answer for the user, or it returns a structured tool call. A structured tool call is more than "just text". It is a tool name plus arguments that must validate against a schema before you run anything.
{
"tool": "search_docs",
"arguments": {
"query": "closing price yesterday",
"top_k": 5
}
}
If the arguments do not validate, your runtime should not try to guess what the model "meant". Validation is the boundary where you prevent accidental or malicious actions. A stronger model generally makes better decisions and recovers from messy tool outputs. It also costs more, responds more slowly, and can tempt you to overstuff the prompt because "it still works". Picking the right model is a trade-off between quality, speed, and budget.
Tools — The Hands #
Tools are what turn a text generator into something that can actually affect the world. A tool can be anything the runtime can call on the model's behalf: a database query, a REST API, a code interpreter, a file operation, or a web search. The model picks a tool based on its name, description, and parameter schema. If those are vague, the model guesses. If they are permissive, the model invents fields.
Here is what a tool definition often looks like at the boundary:
{
"name": "get_stock_price",
"description": "Return the closing price for a ticker on a date.",
"parameters": {
"type": "object",
"properties": {
"ticker": { "type": "string" },
"date": { "type": "string", "description": "YYYY-MM-DD" }
},
"required": ["ticker", "date"],
"additionalProperties": false
}
}
A misleading tool name makes the model call the wrong tool even when the right one exists.
Not all tools are equal. Some are read-only and safe to run repeatedly. Others have side effects and need guardrails like strict scoping, confirmation gates, and idempotency keys. A practical runtime treats every tool call like an untrusted network request: enforce timeouts, cap retries, log every input and output, and fail closed when validation fails.
The Model Context Protocol (MCP) is worth mentioning here. It is an open standard that gives agents a uniform way to connect to tools, so the same tool definition works across different runtimes.
Memory — The Notebook #
Without memory, every turn is a blank slate. Short-term memory holds what happened in this conversation — the messages, the tool results, and any intermediate state the agent has built up. Long-term memory persists across conversations — user preferences, learned facts, and past decisions.
Short-term memory usually means selecting what to put back into the model's context window before each call. You do not include everything. You include the minimum set of messages, tool results, and state needed for the next step. When the context window gets tight, you summarize older turns, drop low-value details, or move bulky artifacts out of the prompt and keep only references.
Long-term memory needs an external store. You write facts or summaries into it and retrieve only what is relevant to the current goal.
A typical retrieval pipeline looks like this:
goal + current step
│
▼
retrieve candidates (top-k)
│
▼
rerank and filter
│
▼
pack into prompt with citations
Chunking and ranking decisions show up as product behavior. Chunk too large and you retrieve noise. Chunk too small and you lose meaning. Retrieve too much and you crowd out the actual task. An agent should not write everything into long-term memory. You need triggers, like "user preference", "recurring project fact", or "decision with future impact". You also need expiry rules so stale facts do not haunt future runs.
Orchestration — The Loop #
Orchestration is the glue. It is the code that runs the model, runs tools, updates state, and decides when to stop.
A basic loop looks like this:
MAX_STEPS = 12
state = {
"messages": [],
"trace": [],
}
for step in range(MAX_STEPS):
context = build_prompt(
system_prompt=SYSTEM,
messages=state["messages"],
retrieved_memory=memory.retrieve(state),
tool_schemas=TOOLS,
)
model_out = call_model(context, tools=TOOLS)
if model_out.type == "final":
return model_out.text
tool_name, tool_args = model_out.tool, model_out.arguments
validate(tool_name, tool_args)
result = run_tool(tool_name, tool_args, timeout_s=15)
state["trace"].append({"tool": tool_name, "args": tool_args, "result": result})
state["messages"].append(render_observation(result))
raise RuntimeError("Stopped: step limit reached")
This is not fancy, but it shows the real job of orchestration: it owns state, budgets, and boundary checks.
Every loop needs an exit. Common stop conditions include "final answer produced", "step limit reached", "tool budget exceeded", and "human approval required". Budgets are not only about cost. They are also safety controls that prevent an agent from thrashing.
ReAct-style loops are sequential by nature. Orchestration can still run independent tool calls in parallel when the step is "gather facts". That is a different pattern, but it starts with the same control loop.
The System Prompt #
The system prompt is where you tell the agent who it is and how it should behave. It defines the agent's persona, sets constraints on what it can and cannot do, and lays out the rules of engagement. In most real setups, instructions come in layers. A system message sets global rules. A developer message sets app-specific behavior. User messages provide the goal and constraints.
The runtime should treat these layers differently. You do not let a user override safety constraints by phrasing a request cleverly. You do let a user override preferences like tone or output format. Tool outputs are untrusted input. A web page, a database row, or a log line can contain instructions that try to hijack the agent. The fix is structural: delimit tool outputs clearly, tell the model to treat them as data rather than instructions, keep an allowlist of which tools can cause side effects, and require explicit confirmation for high-impact actions.
A well-written system prompt is the difference between an agent that stays on task and one that drifts. It is also the cheapest component to change — no code, no infrastructure, just text — which makes it the first thing to tune when an agent misbehaves.
The system prompt does not grant the agent its capabilities (that is the model's job) or give it access to the world (that is what tools are for). It constrains and directs what the agent already has. Think of it as the employee handbook: it does not teach skills, but it sets expectations.
Conclusion #
Every AI agent, no matter how sophisticated, comes back to five building blocks: a model to think, tools to act, memory to remember, orchestration to drive the loop, and a system prompt to set the rules. Around these five sit cross-cutting concerns — safety guardrails, error handling, observability — that we will explore later on.
Key takeaways:
- The loop is what makes an agent an agent — the cycle of reason → act → observe → repeat
- The model decides; tools execute — keeping them separate is what makes the system flexible
- Memory is what turns stateless model calls into coherent, multi-step behavior
- The system prompt constrains and directs — it is the cheapest lever to tune agent behavior
- Every loop needs an exit condition, or it becomes an expensive infinite loop
- These trade-offs — capability vs. cost, flexibility vs. complexity — run through every design decision