Agent Frameworks - Build vs. Buy

Published:

Every team building an AI agent faces the same decision early on: use a framework, or write the orchestration yourself. The answer is never purely technical — it involves trade-offs around velocity, control, debuggability, and long-term maintenance burden.

What Agent Frameworks Provide #

An agent framework is a library or SDK that handles the plumbing between a language model, tools, memory, and an orchestration loop. At minimum, most frameworks give you:

  • A tool-calling abstraction — register functions with schemas, and the framework handles marshalling arguments, invoking the function, and feeding results back to the model.
  • An orchestration loop — the ReAct cycle (or a variant) implemented as a run loop you configure rather than write from scratch.
  • Memory management — conversation history, context window truncation, and sometimes long-term memory backends.
  • Model abstraction — swap between model providers without changing application code.
  • Tracing and observability hooks — structured logs of each agent step for debugging.

Some frameworks go further, offering multi-agent coordination, built-in RAG pipelines, guardrails, and deployment scaffolding. The more a framework provides, the faster you ship — and the more assumptions it bakes in that you may later need to fight.

The core value proposition is speed-to-first-demo. A framework lets you go from zero to a working agent in an afternoon. The question is whether that agent can survive contact with production requirements.

The Abstraction Tax #

Every framework introduces abstractions. Abstractions are useful until they are not — and with agents, you hit their limits faster than you might expect.

Opaque orchestration is the common pain point. When the framework controls the agent loop, you lose visibility into why the agent chose a particular path. Debugging a five-step tool-calling sequence means understanding what the framework did at each step — which prompt it assembled, what truncation it applied, how it handled a tool error. If the framework treats the loop as a black box, debugging becomes archaeology.

Rigid memory models create friction when your use case does not match the framework's assumptions. If the framework assumes conversational memory (append every turn, truncate from the front) but your agent needs task-scoped memory (reset between subtasks, persist only summaries), you are fighting the abstraction rather than leveraging it.

Tool interface constraints appear when your tools do not fit the framework's expected shape. Some frameworks assume tools are synchronous functions that return strings. If your tools are async, stream results, require authentication handshakes, or return structured objects that need post-processing, you end up wrapping your tools in adapter layers that add complexity without adding value.

Prompt assembly opacity is subtle but costly. Frameworks that automatically assemble the system prompt, inject tool descriptions, and manage context make it difficult to control exactly what the model sees. When you need to debug a prompt injection vulnerability or optimize token usage, you need full control over what goes into the context window — and many frameworks do not expose this cleanly.

┌────────────────────────────────────────────────────────┐
│              Abstraction Spectrum                      │
│                                                        │
│  Raw API calls ◄──────────────────────► Full framework │
│                                                        │
│  • Full control        • Fast prototyping              │
│  • Full responsibility • Opinionated choices           │
│  • Custom everything   • Managed complexity            │
│  • Debug anything      • Debug through layers          │
│                                                        │
│          Thin wrapper (sweet spot for many)            │
│                ▲                                       │
│                │                                       │
│     Control ───┼─── Velocity                           │
└────────────────────────────────────────────────────────┘

Architectural Lock-In #

Lock-in with agent frameworks operates at several levels, and each becomes harder to escape as your system grows.

Data model lock-in happens when the framework defines how messages, tool calls, and memory are structured internally. If your conversation history, tool schemas, and agent state all live in framework-specific types, migrating away means rewriting your data layer.

Orchestration lock-in occurs when your business logic is expressed in the framework's DSL or decorator syntax. A multi-agent workflow defined using framework-specific coordination primitives cannot be ported to another framework without a rewrite.

Ecosystem lock-in is the stickiest. Once you adopt a framework's tool registry, its memory backends, its tracing format, and its deployment model, each integration adds another thread that ties you to the framework. Switching costs compound.

The mitigation strategy is straightforward in principle, difficult in practice: keep a thin boundary between your domain logic and the framework. Your tools should be plain functions that the framework calls — not classes that inherit from framework base types. Your memory should live behind an interface you own. Your prompts should be templates you control, not auto-generated by the framework.

# Anti-pattern: domain logic coupled to framework
class MyAgent(FrameworkAgent):
    @framework_tool(name="search", description="Search the web")
    def search(self, query: str) -> str:
        return self.framework_search_client.query(query)

# Better: domain logic independent, framework is a thin adapter
class SearchTool:
    def __init__(self, search_client):
        self.client = search_client

    def run(self, query: str) -> str:
        return self.client.query(query)

# Framework adapter (replaceable)
def register_tools(agent_framework, search_tool):
    agent_framework.register(
        name="search",
        description="Search the web",
        fn=search_tool.run,
    )

When to Use a Framework #

Frameworks earn their keep in specific situations:

Prototyping and exploration. When you are still figuring out whether an agent approach works for your problem, a framework lets you iterate quickly. You can test different tool combinations, memory strategies, and model configurations without writing infrastructure code. If the prototype dies, you have lost days, not months.

Standard patterns with standard requirements. If your agent is a straightforward ReAct loop with a handful of tools, conversational memory, and a single model — and you do not anticipate needing deep customization — a framework handles this well. The orchestration loop is a solved problem; there is no reason to rewrite it.

Teams without agent experience. Frameworks encode best practices. If your team has not built an agent before, the framework's opinions about tool schemas, error handling, and context management save you from repeating mistakes that the framework authors already made and corrected.

Rapid iteration on tool sets. If the primary work is adding and tuning tools rather than customizing orchestration, a framework's tool registry and auto-schema-generation reduce friction for the common case.

When to Roll Your Own #

Building your own orchestration makes sense when:

Your agent loop is non-standard. If you need custom routing logic, conditional branching based on tool results, dynamic prompt assembly that changes per step, or orchestration patterns the framework does not support — you will spend more time working around the framework than you save by using it.

Debuggability is critical. In production systems where you need to explain exactly why the agent took a particular action (for audit, compliance, or safety reasons), owning the loop means you can instrument every decision point. No framework abstraction sits between you and the model.

Performance matters at the margins. Frameworks add overhead — extra serialization, abstraction layers, generic error handling. For latency-sensitive applications (real-time assistants, trading agents), removing framework overhead can shave meaningful milliseconds.

You need deep control over the context window. When token budgets are tight, prompt caching strategies are complex, or you are doing sophisticated context engineering (priority-based injection, dynamic section assembly), you need to own the prompt construction pipeline.

The framework is evolving faster than your product. Agent frameworks are young. Breaking changes, shifting APIs, and abandoned projects are common. If you ship to production on a framework that makes a breaking change, you are on someone else's release schedule.

The build-your-own path is not as expensive as it sounds. The core agent loop — call the model, parse tool calls, execute tools, feed results back — is 50–100 lines of code in most languages. What takes effort is everything around it: retries, streaming, token counting, memory management, tracing. But those are also the pieces where your requirements diverge most from the framework's assumptions.

# A minimal agent loop — the core is simple
def agent_loop(model, tools, messages, max_steps=10):
    for step in range(max_steps):
        response = model.chat(messages)

        if response.tool_calls:
            messages.append(response.to_message())
            for call in response.tool_calls:
                tool = tools[call.name]
                result = tool.run(**call.arguments)
                messages.append(tool_result_message(call.id, result))
        else:
            return response.content

    return "Max steps reached"

A Decision Framework #

Rather than a binary build-or-buy choice, think of it as a spectrum with three positions:

Full framework — use when prototyping, when the pattern is standard, or when your team needs guardrails. Accept the abstraction tax in exchange for velocity.

Thin wrapper — write a minimal adapter layer over the model API that handles tool calling and the run loop, but keep memory, prompts, and tools as your own code. This gives you 80% of the framework benefit with 20% of the lock-in.

Custom orchestration — write the loop, the memory layer, and the prompt assembly yourself. Use when you have non-standard requirements, need full debuggability, or are building the agent as a core product (not a feature).

Teams start with a full framework, hit its limits within a few months, and migrate to a thin wrapper. Knowing this trajectory in advance lets you architect for it: keep your tools portable, your memory behind interfaces, and your prompts in templates you own.

Conclusion #

Agent frameworks solve real problems — they accelerate prototyping, encode best practices, and handle boilerplate. But they also introduce opacity, constrain your architecture, and create lock-in that compounds over time. The right choice depends on your team's experience, your customization needs, and how central the agent is to your product.

The safest architectural principle is to treat the framework as a replaceable adapter rather than a foundation. Own your tools, your memory interface, and your prompt templates. Let the framework handle the loop — until it cannot. When that day comes, the migration should be a weekend project, not a quarter-long rewrite.