The next article initially appeared on Hugo Bowne-Anderson’s e-newsletter, Vanishing Gradients, and is being republished right here with the writer’s permission.
On this publish, we’ll construct two AI brokers from scratch in Python. One will likely be a coding agent, the opposite a search agent.
Why have I referred to as this publish “Easy methods to Construct a Common-Objective AI Agent in 131 Traces of Python” then? Nicely, because it seems now, coding brokers are literally general-purpose brokers in some fairly shocking methods.
What I imply by that is after getting an agent that may write code, it may possibly:
- Do an enormous variety of stuff you don’t typically consider as involving code, and
- Lengthen itself to do much more issues.
It’s extra applicable to consider coding brokers as “computer-using brokers” that occur to be nice at writing code. That doesn’t imply you need to all the time construct a general-purpose agent, nevertheless it’s value understanding what you’re truly constructing once you give an LLM shell entry. That’s additionally why we’ll construct a search agent on this publish: to indicate the sample works no matter what you’re constructing.
For instance, the coding agent we’ll construct under has 4 instruments: learn, write, edit, and bash.
Watch this two-minute video to see the way it can clear your desktop and why you need to consider coding brokers as “computer-using brokers” that occur to be nice at writing code:
It may well do
- File/life group: Clear your desktop, kind downloads by kind, rename trip pictures with dates, discover and delete duplicates, set up receipts into folders. . .
- Private productiveness: Search all of your notes for one thing you half-remember, compile a packing record from previous journeys, discover all PDFs containing “tax” from final yr. . .
- Media administration: Rename a season of TV episodes correctly, convert photographs to completely different codecs, extract audio from movies, resize pictures for social media. . .
- Writing and content material: Mix a number of docs into one, convert between codecs, find-and-replace throughout many information. . .
- Information wrangling: Flip a messy CSV right into a clear tackle e-book, extract emails from a pile of information, merge spreadsheets from completely different sources. . .
It is a small subset of what’s doable. It’s additionally the explanation Claude Cowork appeared promising and why OpenClaw has taken off in the best way it did.
So how are you going to construct this? On this publish, I’ll present you find out how to construct a minimal model.
Brokers are simply LLMs with instruments in a loop
Brokers are simply LLMs with instruments in a dialog loop and as soon as you recognize the sample, you’ll be capable to construct all forms of brokers with it:
As Ivan Leo wrote,
The barrier to entry is remarkably low: half-hour and you’ve got an AI that may perceive your codebase and make edits simply by speaking to it.
The aim right here is to indicate that the sample is similar no matter what you’re constructing an agent for. Coding agent, search agent, browser agent, e mail agent, database agent: all of them comply with the identical construction. The one distinction is the instruments you give them.
Half 1: The coding agent
We’ll begin with a coding agent that may learn, write, and execute code. As acknowledged, the power to write down and execute code with bash additionally turns a “coding agent” right into a “general-purpose agent.” With shell entry, it may possibly do something you are able to do from a terminal:
- Type and set up your native filesystem
- Clear up your desktop
- Batch rename pictures
- Convert file codecs
- Handle Git repos throughout a number of initiatives
- Set up and configure software program
You’ll find the code right here.
Try Ivan Leo’s publish for the way to do that in JavaScript and Thorsten Ball’s publish for find out how to do it in Go.
Setup
Begin by creating our undertaking:

We’ll be utilizing Anthropic right here. Be happy to make use of your LLM of alternative. For bonus factors, use Pydantic AI (or the same library) and have a constant interface for the assorted completely different LLM suppliers. That manner you need to use the identical agentic framework for each Claude and Gemini!
Ensure you’ve acquired an Anthropic API key set as ANTHROPIC_API_KEY atmosphere variable.
We’ll construct our agent in 4 steps:
- Hook up our LLM
- Add a software that reads information
- Add extra instruments:
write,edit, andbash
- Add extra instruments:
- Construct the agentic loop
- Construct the conversational loop
1. Hook up our LLM


Textual content in, textual content out. Good! Now let’s give it a software.
2. Add a software (learn)
We’ll begin by implementing a software referred to as learn which is able to permit the agent to learn information from the filesystem. In Python, we will use Pydantic for schema validation, which additionally generates JSON schemas we will present to the API:

The Pydantic mannequin provides us two issues: validation and a JSON schema. We will see what the schema appears to be like like:


We wrap this right into a software definition that Claude understands:

Then we add instruments to the API name, deal with the software request, execute it, and ship the outcome again:

Let’s see what occurs once we run it:

This script calls the Claude API with a consumer question handed by way of command line. It sends the question, will get a response, and prints it.
Observe that the LLM matched on the software description: Correct, particular descriptions are key! It’s additionally value mentioning that we’ve made two LLM calls right here:
- One wherein the software is named
- A second wherein we ship the results of the software name again to the LLM to get the ultimate outcome
This typically journeys up folks constructing brokers for the primary time, and Google has made a pleasant visualization of what we’re truly doing:

2a. Add extra instruments (write, edit, bash)
We now have a learn software, however a coding agent must do greater than learn. It must:
- Write new information
- Edit current ones
- Execute code to check it
That’s three extra instruments: write, edit, and bash.
Identical sample as learn. First the schemas:

Then the executors:

And the software definitions, together with the code that runs whichever one Claude picks:

The bash software is what makes this truly helpful: Claude can now write code, run it, see errors, and repair them. But it surely’s additionally harmful. This software may delete your whole filesystem! Proceed with warning: Run it in a sandbox, a container, or a VM.
Curiously, bash is what turns a “coding agent” right into a “general-purpose agent.” With shell entry, it may possibly do something you are able to do from a terminal:
- Type and set up your native filesystem
- Clear up your desktop
- Batch rename pictures
- Convert file codecs
- Handle Git repos throughout a number of initiatives
- Set up and configure software program
It was truly “Pi: The Minimal Agent Inside OpenClaw” that impressed this instance.
Attempt asking Claude to edit a file: It typically desires to learn it first to see what’s there. However our present code solely handles one software name. That’s the place the agentic loop is available in.
3. Construct the agentic loop
Proper now Claude can solely name one software per request. However actual duties want a number of steps: learn a file, edit it, run it, see the error, repair it. We want a loop that lets Claude maintain calling instruments till it’s carried out.
We wrap the software dealing with in a whereas True loop:

Observe that right here we’ve got despatched the whole previous historical past of gathered messages as we progress via loop iterations. When constructing this out extra, you’ll wish to engineer and handle your context extra successfully. (See under for extra on this.)
Let’s attempt a multistep activity:

4. Construct the conversational loop
Proper now the agent handles one question and exits. However we would like a back-and-forth dialog: Ask a query, get a solution, ask a follow-up. We want an outer loop that retains asking for enter.
We wrap the whole lot in a whereas True:

The messages record persists throughout turns, so Claude remembers context. That’s the whole coding agent.
As soon as once more we’re merely appending all earlier messages, which implies the context will develop fairly rapidly!
A be aware on agent harnesses
An agent harness is the scaffolding and infrastructure that wraps round an LLM to show it into an agent. It handles:
- The loop: prompting the mannequin, parsing its output, executing instruments, feeding outcomes again
- Device execution: truly working the code/instructions the mannequin asks for
- Context administration: what goes within the immediate, token limits, historical past
- Security/guardrails: affirmation prompts, sandboxing, disallowed actions
- State: conserving observe of the dialog, information touched, and many others.
And extra.
Consider it like this: The LLM is the mind; the harness is the whole lot else that lets it truly do issues.
What we’ve constructed above is the hey world of agent harnesses. It covers the loop, software execution, and fundamental context administration. What it doesn’t have: security guardrails, token limits, persistence, or perhaps a system immediate!
When constructing out from this foundation, I encourage you to comply with the paths of:
- The Pi coding agent, which provides context loading
AGENTS.mdfrom a number of directories, persistent periods you may resume and department, and an extensibility system (abilities, extensions, prompts) - OpenClaw, which works additional: a persistent daemon (always-on, not invoked), chat because the interface (Telegram, WhatsApp, and many others.), file-based continuity (
SOUL.md,MEMORY.md, each day logs), proactive habits (heartbeats, cron), preintegrated instruments (browser, subagents, gadget management), and the power to message you with out being prompted
Half 2: The search agent
To be able to actually present you that the agentic loop is what powers any agent, we’ll now construct a search agent (impressed by a podcast I did with search legends John Berryman and Doug Turnbull). We’ll use Gemini for the LLM and Exa for net search. You’ll find the code right here.
However first, the astute reader might have an attention-grabbing query: If a coding agent actually is a general-purpose agent, why would anybody wish to construct a search agent once we may simply get a coding agent to increase itself and switch itself right into a search agent? Nicely, as a result of if you wish to construct a search agent for a enterprise, you’re not going to do it by constructing a coding agent first… So let’s construct it!
Setup
As earlier than, we’ll construct this step-by-step. Begin by creating our undertaking:

Set GEMINI_API_KEY (from Google AI Studio) and EXA_API_KEY (from exa.ai) as atmosphere variables.
We’ll construct our agent in 4 steps (the identical 4 steps as all the time):
- Hook up our LLM
- Add a software (web_search)
- Construct the agentic loop
- Construct the conversational loop
1. Hook up our LLM


2. Add a software (web_search)
Gemini can reply from its coaching information, however we don’t need that, man! For present info, it wants to look the net. We’ll give it a web_search software that calls Exa.

The system instruction grounds the mannequin, (ideally) forcing it to look as an alternative of guessing. Observe you could configure Gemini to all the time use web_search, which is 100% reliable, however I needed to indicate the sample that you need to use with any LLM API.
We then ship the software name outcome again to Gemini:

3. Construct the agentic loop
Some questions want a number of searches. “Evaluate X and Y” requires looking for X, then looking for Y. We want a loop that lets Gemini maintain looking out till it has sufficient info.


4. Construct the conversational loop
Identical as earlier than: We wish back-and-forth dialog, not one question and exit. Wrap the whole lot in an outer loop:

Messages persist throughout turns, so follow-up questions have context.
Lengthen it
The sample is similar for each brokers. Add any software:
web_searchto the coding agent: Look issues up whereas codingbashto the search agent: Act on what it findsbrowser: Navigate web sitessend_email: Talkdatabase_query: Run SQL
One factor we’ll be doing is displaying how basic objective a coding agent actually might be. As Armin Ronacher wrote in “Pi: The Minimal Agent Inside OpenClaw”:
Pi’s whole concept is that if you need the agent to do one thing that it doesn’t do but, you don’t go and obtain an extension or a talent or one thing like this. You ask the agent to increase itself. It celebrates the thought of code writing and working code.
Conclusion
Constructing brokers is simple. The magic isn’t advanced algorithms; it’s the dialog loop and well-designed instruments.
Each brokers comply with the identical sample:
- Hook up the LLM
- Add a software (or a number of instruments)
- Construct the agentic loop
- Construct the conversational loop
The one distinction is the instruments.
Thanks to Ivan Leo, Eleanor Berger, Mike Powers, Thomas Wiecki, and Mike Loukides for offering suggestions on drafts of this publish.
