Building Yet Another Coding Agent — Part 1

Building Yet Another Coding Agent — Part 1
Photo by Growtika / Unsplash

At my current job, we are building a full-stack data analytics platform. A big chunk of the work is integrating external APIs.

We need to handle things like pagination, authentication, rate limits, response caching, monitoring, alerting etc. Most of these tasks can be abstracted out and built into a framework.

But the core fetch logic for each integration still has to be written by hand.

These days, coding agents using LLMs are getting popular. While complex multi-agent systems and elaborate planning frameworks exist, at their core, many practical AI coding agents boil down to a surprisingly simple architectural pattern: a loop, driven by an LLM, integrating external tool calls.

I wanted to explore if this can be used in our case.

What if we could build a coding agent that can understand these inputs i.e config, Postman collection and API docs, and generate the code?

We can give the agent the ability to run that code in a safe sandbox environment to fetch data. Then the user can preview the data and make changes, either by manually editing the code or by prompting the agent.

Before building that, I wanted to understand what's actually happening in these agents.

So I built the simplest possible version first.

The loop

A coding agent is just a loop.

while True:
    user_input = get_input()
    response = llm.complete(messages, tools)
    print(response)

The LLM doesn't write files or run code. It just decides what should happen. The LLM just directs our code, our code execute, feed the result back, repeat.

The interesting part is the tool use. The model returns a tool call, like create_file,run_file or edit_file with arguments. Our code executes it, send the output back, and the model reacts.

Tools

Four tools are enough to start: read, create, modify, run.

import os
import subprocess
import sys


def read_file(filename: str) -> str:
    """
    Reads the content of a file. Use this to inspect existing code or output.
    """
    try:
        with open(filename, 'r') as f:
            return f"Content of {filename}:\n{f.read()}"
    except FileNotFoundError:
        return f"Error: {filename} not found."
    except Exception as e:
        return f"Error reading file: {e}"


def create_file(filename: str, content: str) -> str:
    """
    Creates a new file with the given content. Use this to write new Python scripts.
    """
    try:
        if os.path.dirname(filename):
            os.makedirs(os.path.dirname(filename), exist_ok=True)
        with open(filename, 'w') as f:
            f.write(content)
        return f"Created {filename}."
    except Exception as e:
        return f"Error creating file: {e}"


def modify_file(filename: str, old_content: str, new_content: str) -> str:
    """
    Replaces a specific block of text in a file. Prefer this for targeted edits.
    old_content must match exactly.
    """
    try:
        with open(filename, 'r') as f:
            current = f.read()
        if old_content not in current:
            return f"Error: Could not find the text to replace in {filename}."
        updated = current.replace(old_content, new_content, 1)
        with open(filename, 'w') as f:
            f.write(updated)
        return f"Modified {filename}."
    except FileNotFoundError:
        return f"Error: {filename} not found."
    except Exception as e:
        return f"Error modifying file: {e}"


def run_file(filename: str) -> str:
    """
    Executes a Python file and returns stdout and stderr.
    Use this after writing or modifying code to verify it works.
    """
    try:
        result = subprocess.run(
            [sys.executable, filename],
            capture_output=True,
            text=True,
            timeout=30
        )
        output = []
        if result.stdout:
            output.append(f"stdout:\n{result.stdout}")
        if result.stderr:
            output.append(f"stderr:\n{result.stderr}")
        output.append(f"Exit code: {result.returncode}")
        return "\n".join(output)
    except subprocess.TimeoutExpired:
        return "Error: Script timed out after 30 seconds."
    except Exception as e:
        return f"Error running file: {e}"

The docstrings matter. The model reads them to decide when and how to call each tool.
Vague descriptions lead to bad decisions — calling run_file before writing anything,
rewriting whole files when one line needs changing.


Tool schema

You can't pass Python functions directly to the LLM. You describe each tool in a
schema, and the SDK uses that to tell the model what's available.

tools = [
    {
        "name": "read_file",
        "description": "Reads the content of a file. Use this to inspect existing code or output.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string", "description": "Path to the file."}
            },
            "required": ["filename"]
        }
    },
    {
        "name": "create_file",
        "description": "Creates a new file with the given content. Use this to write new Python scripts.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "content": {"type": "string", "description": "Full file content."}
            },
            "required": ["filename", "content"]
        }
    },
    {
        "name": "modify_file",
        "description": "Replaces a specific block of text in a file. Prefer this for targeted edits.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "old_content": {"type": "string", "description": "The exact text to replace."},
                "new_content": {"type": "string", "description": "The replacement text."}
            },
            "required": ["filename", "old_content", "new_content"]
        }
    },
    {
        "name": "run_file",
        "description": "Executes a Python file. Use this after writing or editing to verify the code works.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"}
            },
            "required": ["filename"]
        }
    }
]

Wiring it up

The loop needs conversation history. Without it, the model forgets what code it wrote
two turns ago. Every message goes into messages, including tool results.

import anthropic
import json

client = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a coding agent helping write Python data integration scripts.

When asked to write code:
1. Create the file using create_file.
2. Run it using run_file.
3. If there are errors, use modify_file to fix them and run again.
4. Tell the user what you did and show relevant output.

Prefer targeted edits over full rewrites.
"""

tool_map = {
    "read_file": read_file,
    "create_file": create_file,
    "modify_file": modify_file,
    "run_file": run_file,
}


def run_agent():
    messages = []
    print("Coding Agent ready. Ctrl+C to exit.\n")

    while True:
        user_input = input("You: ").strip()
        if not user_input:
            continue

        messages.append({"role": "user", "content": user_input})

        # inner loop: keep going until the model stops calling tools
        while True:
            response = client.messages.create(
                model="claude-opus-4-5",
                max_tokens=4096,
                system=SYSTEM_PROMPT,
                tools=tools,
                messages=messages
            )

            for block in response.content:
                if hasattr(block, 'text'):
                    print(f"\nAgent: {block.text}")

            if response.stop_reason == "end_turn":
                messages.append({"role": "assistant", "content": response.content})
                break

            if response.stop_reason == "tool_use":
                messages.append({"role": "assistant", "content": response.content})

                tool_results = []
                for block in response.content:
                    if block.type == "tool_use":
                        print(f"\n[{block.name}({json.dumps(block.input)})]")
                        fn = tool_map.get(block.name)
                        result = fn(**block.input) if fn else f"Unknown tool: {block.name}"
                        print(f"[→ {result[:300]}{'...' if len(result) > 300 else ''}]")
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": result
                        })

                messages.append({"role": "user", "content": tool_results})


if __name__ == "__main__":
    run_agent()

The inner while True is the part most examples leave out. A task like "write this
script and make sure it runs" needs multiple tool calls in sequence: write, run, read
the error, fix, run again. You need to keep cycling until stop_reason is end_turn.
Stopping after the first tool use breaks the whole thing.

Running it

❯ uv run agent.py
Agent ready. Ctrl+C to exit.

You: create a python code for adding two numbers

[create_file({"filename": "add_numbers.py", "content": "# Python program to add two numbers\n\ndef add_two_numbers(a, b):\n    \"\"\"Add two numbers and return the result.\"\"\"\n    return a + b\n\n# Example usage\nif __name__ == \"__main__\":\n    num1 = 5\n    num2 = 3\n    \n    result = add_two_numbers(num1, num2)\n    print(f\"The sum of {num1} and {num2} is: {result}\")\n"})]
[→ Created add_numbers.py.]

[run_file({"filename": "add_numbers.py"})]
[→ stdout:
The sum of 5 and 3 is: 8

Exit code: 0]

Agent: I've created a simple Python script called `add_numbers.py` that adds two numbers. Here's what it does:

1. **Defines a function** `add_two_numbers(a, b)` that takes two parameters and returns their sum
2. **Demonstrates usage** with example numbers (5 and 3)
3. **Outputs the result**: `The sum of 5 and 3 is: 8`

The code ran successfully! You can modify the `num1` and `num2` variables to add different numbers, or import the `add_two_numbers` function into other scripts.
You: exit

Agent: Goodbye! If you need help with more Python code in the future, feel free to come back. Have a great day! 👋
You: lets make change to the code so that it can either add numbers or add string

[modify_file({"filename": "add_numbers.py", "old_content": "# Python program to add two numbers\n\ndef add_two_numbers(a, b):\n    \"\"\"Add two numbers and return the result.\"\"\"\n    return a + b\n\n# Example usage\nif __name__ == \"__main__\":\n    num1 = 5\n    num2 = 3\n    \n    result = add_two_numbers(num1, num2)\n    print(f\"The sum of {num1} and {num2} is: {result}\")", "new_content": "# Python program to add two numbers or concatenate strings\n\ndef add_values(a, b):\n    \"\"\"Add two numbers or concatenate two strings and return the result.\"\"\"\n    if isinstance(a, (int, float)) and isinstance(b, (int, float)):\n        return a + b\n    elif isinstance(a, str) and isinstance(b, str):\n        return a + b\n    else:\n        raise TypeError(\"Both values must be either numbers or strings\")\n\n# Example usage\nif __name__ == \"__main__\":\n    # Adding numbers\n    num1 = 5\n    num2 = 3\n    result_numbers = add_values(num1, num2)\n    print(f\"The sum of {num1} and {num2} is: {result_numbers}\")\n    \n    # Concatenating strings\n    str1 = \"Hello, \"\n    str2 = \"World!\"\n    result_strings = add_values(str1, str2)\n    print(f\"The concatenation of '{str1}' and '{str2}' is: {result_strings}\")"})]
[→ Modified add_numbers.py.]

[run_file({"filename": "add_numbers.py"})]
[→ stdout:
The sum of 5 and 3 is: 8
The concatenation of 'Hello, ' and 'World!' is: Hello, World!

Exit code: 0]

Wrote it, ran it, saw the error, fixed it, ran it again. No manual intervention.

What's missing

This is a skeleton. A few gaps to keep in mind:

Sandboxing. run_file currently runs with full access to your machine. For
anything beyond local experiments, you'd want Docker/firecracker or at minimum a restricted
subprocess environment.

History truncation. A long session will eventually overflow the context window.
You'll need a sliding window, summarization, or a turn cap.

Cost. Each tool round-trip is a separate API call. For multi-step tasks these
add up. Worth tracking early.