← Back to Blog

Build a Tool-Using AI Agent with the Claude Agent SDK (Python)

Most "AI agent" tutorials still show you three lines that call the Anthropic Messages API directly, parse the tool_use block by hand, and hope the JSON shape matches what the docs said. It works for a demo, then collapses the moment you add a second tool, a permission check, or an error path. The Claude Agent SDK exists for exactly this gap. It packages Claude Code's toolset, the agent loop, MCP server plumbing, and the streaming protocol into a Python library so you can focus on what your agent should do, not on reimplementing the run loop.

This guide goes from pip install to a small agent that exposes a custom tool, intercepts a dangerous command with a hook, and streams output back to the terminal. Everything is copy-pasteable. The package is the official claude-agent-sdk on PyPI, currently at version 0.2.110 as of this writing. It is marked 3 - Alpha, so pin a version if you put it in production.

Prerequisites

  • Python 3.10 or newer
  • An Anthropic API key, set as ANTHROPIC_API_KEY in your shell or in a .env file
  • About 50MB of disk for the SDK and the bundled Claude Code CLI
  • A terminal you can paste commands into

That is the whole list. The SDK bundles the Claude Code CLI, so there is nothing else to install or download.

Step 1: Install the SDK

python -m venv .venv
source .venv/bin/activate
pip install --upgrade claude-agent-sdk

Sanity check the install:

python -c "import claude_agent_sdk; print(claude_agent_sdk.__file__)"

If the import prints a path under your virtualenv, you are good. If you get ModuleNotFoundError, your shell is probably using a different interpreter. Re-run source .venv/bin/activate and try again.

Step 2: A One-Shot Query with query()

The simplest entry point is query(). It is an async generator that yields messages as Claude produces them. You do not manage a session, you do not handle the tool loop, you just iterate.

import anyio
from claude_agent_sdk import query, AssistantMessage, TextBlock

async def main():
    async for message in query(prompt="What is 2 + 2?"):
        if isinstance(message, AssistantMessage):
            for block in message.content:
                if isinstance(block, TextBlock):
                    print(block.text)

anyio.run(main)

Run it:

python quickstart.py

You should see something like 4. Not impressive on its own, but the streaming part is the part that matters: query() returns each message as soon as the CLI emits it, so the same code that prints a one-line answer will print a 200-line refactor plan token by token with no extra work.

Step 3: Expose a Custom Tool

A custom tool is just a Python function decorated with @tool, plus a name, a description, and a JSON schema for its input. The SDK wraps it in an in-process MCP server, which is honestly the cleanest part of the API. No subprocess, no stdio plumbing, no JSON-RPC handshake to debug.

import anyio
from claude_agent_sdk import (
    tool,
    create_sdk_mcp_server,
    ClaudeAgentOptions,
    ClaudeSDKClient,
)

@tool("sum", "Add two integers and return the result", {"a": int, "b": int})
async def sum_numbers(args):
    return {
        "content": [
            {"type": "text", "text": str(args["a"] + args["b"])}
        ]
    }

server = create_sdk_mcp_server(
    name="math",
    version="1.0.0",
    tools=[sum_numbers],
)

options = ClaudeAgentOptions(
    mcp_servers={"math": server},
    allowed_tools=["mcp__math__sum"],
    max_turns=3,
)

async def main():
    async with ClaudeSDKClient(options=options) as client:
        await client.query("Use the sum tool to compute 17 + 25.")
        async for message in client.receive_response():
            print(message)

anyio.run(main)

Three things to notice:

  1. The tool name appears in the allowed_tools list with the prefix mcp__<server-name>__<tool-name>. Get the prefix wrong and the model will not be able to call the tool.
  2. allowed_tools is an allowlist for auto-approval, not a registry of what is available. To actually remove a tool from the conversation, use disallowed_tools instead. The permissions guide has the full order of evaluation.
  3. max_turns caps the agent loop. Without it, a confused model can chew through tokens calling the same tool over and over.

Step 4: Block Dangerous Commands with a Hook

Hooks are the part I wish I had known about on day one. They let your code run inside the agent loop at well-defined points, and they can return a decision that the loop will honor. The classic use case is stopping the model from running rm -rf or from editing a file outside the project directory.

from claude_agent_sdk import (
    ClaudeAgentOptions,
    ClaudeSDKClient,
    HookMatcher,
)

FORBIDDEN = ["rm -rf", "DROP TABLE", "mkfs"]

async def block_dangerous(input_data, tool_use_id, context):
    if input_data["tool_name"] != "Bash":
        return {}
    command = input_data["tool_input"].get("command", "")
    for pattern in FORBIDDEN:
        if pattern in command:
            return {
                "hookSpecificOutput": {
                    "hookEventName": "PreToolUse",
                    "permissionDecision": "deny",
                    "permissionDecisionReason": f"Refused: '{pattern}' is not allowed",
                }
            }
    return {}

options = ClaudeAgentOptions(
    allowed_tools=["Bash"],
    hooks={
        "PreToolUse": [
            HookMatcher(matcher="Bash", hooks=[block_dangerous]),
        ],
    },
)

async def main():
    async with ClaudeSDKClient(options=options) as client:
        await client.query("Delete everything in /tmp with rm -rf")
        async for msg in client.receive_response():
            print(msg)

anyio.run(main)

When the hook returns permissionDecision: "deny", the agent loop does not execute the tool. The model sees a refusal result and has to come up with another approach. The full list of events and their payloads lives in the hooks guide.

Step 5: Handle Errors Explicitly

The SDK ships typed errors for the common failure modes. Catch them where it matters; let the rest bubble.

from claude_agent_sdk import (
    query,
    CLINotFoundError,
    CLIConnectionError,
    ProcessError,
    CLIJSONDecodeError,
)

try:
    async for message in query(prompt="Summarize README.md"):
        print(message)
except CLINotFoundError:
    print("Claude Code CLI missing. Reinstall the SDK.")
except CLIConnectionError:
    print("CLI exited before producing output.")
except ProcessError as e:
    print(f"CLI failed with exit code {e.exit_code}: {e}")
except CLIJSONDecodeError:
    print("Could not parse a message from the CLI. Open an issue.")

CLINotFoundError is the one I have hit most often, usually after upgrading the SDK in one venv and running it from another. Check the path with which claude and make sure it points inside the active venv.

When to Use the SDK and When to Call the API Directly

Reach for the Claude Agent SDK when your agent needs to read files, run shell commands, or stitch together multiple tool calls. The loop, the permission model, and the streaming protocol are solved problems and you do not want to re-solve them.

Call the Anthropic Messages API directly when you are building a chat surface with no tool use, doing single-shot structured extraction, or running an offline batch where you want explicit control of every token. The API gives you full visibility; the SDK gives you the loop and tools for free.

For long-running autonomous work, the SDK has a settings isolation feature and a programmatic subagent API. Both are worth a look once your first prototype works.

Common Pitfalls

Forgetting the mcp__<server>__<tool> prefix. The tool works, the model can see it, and then nothing happens because the allowlist says "sum" but the model is calling "mcp__math__sum". Check the prefix first when a custom tool is being ignored.

Mixing query() and ClaudeSDKClient in the same script. query() does not support custom tools or hooks. If you need either, switch to ClaudeSDKClient and use await client.query() plus async for msg in client.receive_response() instead.

Letting the agent loop run forever. A confused model can rack up real cost calling the same failing tool in a loop. Always set max_turns. Five is usually enough for a single question, ten for a small task, twenty for a real refactor.

Trusting the first tool result. The agent is not a function call, it is a loop. A tool that returns success and lies is worse than one that errors out. Add a hook that logs every tool invocation and another that validates the shape of the result. You will thank yourself the first time something silently misbehaves.

Where to Go Next

Need Help Implementing This?

I help teams design and build scalable cloud infrastructure, DevOps pipelines, and production-grade systems.

Book a Free Consultation