Build an MCP Client That Connects to Multiple Servers in Python

Every MCP tutorial I have read connects a client to one server, lists its tools, and calls it a day. That works for a demo. It falls apart the moment you want an agent that can read a file, query Postgres, search the web, and post to Slack, all in the same turn. The Python SDK was designed for this. The tutorials just rarely show it.

This article builds a small client that connects to two MCP servers at the same time: a math server (a few arithmetic tools) and a notes server (a tool to save and recall short strings). It then takes the combined tool list, rewrites the names with a server prefix so they cannot collide, and hands the result to Claude as Anthropic-format tool definitions. When the model calls a tool, the client looks up which server owns it, dispatches the call, and feeds the result back into the conversation.

You end up with a working multi-server agent in about 150 lines of Python. No framework on top, no LangChain, no abstractions you have to learn before you can debug.

Prerequisites

Python 3.10 or newer
uv for dependency management (pip install uv if you do not have it)
An Anthropic API key in your shell as ANTHROPIC_API_KEY
About 5 minutes

The MCP Python SDK is at version 1.28.1 as of this writing. Pin to mcp>=1.27,<2 if you ship this anywhere; v2 is still in alpha (current pre-release is 2.0.0a3) and the import paths and transport signatures are different.

Step 1: Scaffold Two Small Servers

You need something to connect to. Two minimal servers are easier to reason about than one big one, and the naming collision problem only shows up when you have more than one.

mkdir multi-mcp && cd multi-mcp
uv init --no-readme .
uv add "mcp[cli]" anthropic

Replace math_server.py:

"""A minimal math MCP server."""
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("math")


@mcp.tool()
def add(a: float, b: float) -> float:
    """Add two numbers and return the result."""
    return a + b


@mcp.tool()
def multiply(a: float, b: float) -> float:
    """Multiply two numbers and return the result."""
    return a * b


if __name__ == "__main__":
    mcp.run(transport="stdio")

And notes_server.py:

"""A minimal notes MCP server."""
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("notes")
_STORE: dict[str, str] = {}


@mcp.tool()
def save(key: str, value: str) -> str:
    """Save a string under a key. Overwrites if the key already exists."""
    _STORE[key] = value
    return f"saved {len(value)} chars under '{key}'"


@mcp.tool()
def recall(key: str) -> str:
    """Recall a string by key. Returns 'not found' if the key does not exist."""
    return _STORE.get(key, "not found")


if __name__ == "__main__":
    mcp.run(transport="stdio")

Both are valid MCP servers. They speak the same protocol the official servers speak, which is the whole point of the standard: anyone can write a new one and any client can use it.

Step 2: Connect to Both Servers in One Client

The SDK gives you stdio_client for subprocess transports and ClientSession for the protocol itself. Connecting to N servers is just N pairs of these wrapped in asyncio.gather. The session objects are independent, and that is fine. They share nothing by design.

Create client.py:

"""Connect to two MCP servers and expose their tools as Anthropic tool definitions."""
import asyncio
import os
from contextlib import asynccontextmanager
from typing import AsyncIterator

from anthropic import Anthropic
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

SERVERS = {
    "math": StdioServerParameters(command="python", args=["math_server.py"]),
    "notes": StdioServerParameters(command="python", args=["notes_server.py"]),
}


@asynccontextmanager
async def multi_session() -> AsyncIterator[dict[str, ClientSession]]:
    """Open one ClientSession per server, in parallel, and yield them by name."""
    sessions: dict[str, ClientSession] = {}

    async def _open(name: str, params: StdioServerParameters) -> None:
        # Each stdio_client() call launches the subprocess for us.
        async with stdio_client(params) as (read, write):
            session = ClientSession(read, write)
            await session.initialize()
            sessions[name] = session

    # Open all servers concurrently. If one fails, the others still clean up.
    try:
        await asyncio.gather(*(_open(n, p) for n, p in SERVERS.items()))
        yield sessions
    finally:
        # ClientSession has no explicit close in 1.x; the context managers above
        # handle subprocess teardown when the function returns.
        pass


async def list_all_tools(sessions: dict[str, ClientSession]) -> list[dict]:
    """Return Anthropic-format tool definitions for every tool across every server.

    Names are rewritten as '<server>__<tool>' so a tool called 'delete' on two
    servers can never collide when Claude sees them.
    """
    tools: list[dict] = []
    for server_name, session in sessions.items():
        result = await session.list_tools()
        for tool in result.tools:
            tools.append({
                "name": f"{server_name}__{tool.name}",
                "description": tool.description or "",
                "input_schema": tool.inputSchema,
            })
    return tools


async def dispatch_tool_call(
    sessions: dict[str, ClientSession], name: str, arguments: dict
) -> str:
    """Route a Claude tool_use block to the right server and return its text output."""
    server_name, _, tool_name = name.partition("__")
    session = sessions[server_name]
    result = await session.call_tool(tool_name, arguments=arguments)
    # Each content block has a .text attribute. Concatenate the text ones.
    parts = []
    for block in result.content:
        text = getattr(block, "text", None)
        if text is not None:
            parts.append(text)
    return "
".join(parts) if parts else "(no output)"

The __ separator is the same convention the Claude Agent SDK uses internally. Picking it on day one means tools you write later can move between standalone clients and the SDK without renaming.

Step 3: Drive the Tool Loop With Claude

This is the part the SDK does not do for you, because routing the Anthropic Messages API loop is the host's job. About 40 lines.

async def run_conversation(prompt: str) -> str:
    anthropic = Anthropic()
    async with multi_session() as sessions:
        tools = await list_all_tools(sessions)
        messages = [{"role": "user", "content": prompt}]

        # Loop with a hard cap. Models can spiral, and you want to be awake for it.
        for _ in range(8):
            response = anthropic.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=1024,
                tools=tools,
                messages=messages,
            )

            # If the model did not call a tool, we are done.
            if response.stop_reason != "tool_use":
                return "".join(
                    b.text for b in response.content if hasattr(b, "text")
                )

            # Append the assistant turn, then run every tool call it asked for.
            messages.append({"role": "assistant", "content": response.content})
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    output = await dispatch_tool_call(
                        sessions, block.name, block.input
                    )
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": output,
                    })
            messages.append({"role": "user", "content": tool_results})

        return "(gave up after 8 turns)"


if __name__ == "__main__":
    question = "Multiply 17 by 23, then save the answer under the key 'magic'."
    print(asyncio.run(run_conversation(question)))

Two things worth pointing out. First, the model can issue more than one tool call in a single turn. If Claude decides to call math__add and notes__save in the same response, the loop above handles both before the next model call. Second, tool_use_id ties each result back to the right tool_use block. Skipping it is the single most common source of "model complains about mismatched tool IDs" errors.

Step 4: Run It

export ANTHROPIC_API_KEY=sk-ant-...
uv run python client.py

You should see something like 391 followed by a confirmation line. The model called math__multiply, got 391, then called notes__save with {"key": "magic", "value": "391"}. If you run it again and ask "what is stored under 'magic'?", it will call notes__recall and tell you 391. Two servers, one conversation, no custom glue.

Common Pitfalls

Forgetting the separator convention. Two tools both named search on different servers will silently overwrite each other in Claude's tool list. Always prefix with the server name. The server_name__tool_name format is the de-facto standard, and the Claude Agent SDK uses the same one.

Reading tool.inputSchema instead of input_schema. The MCP server response uses camelCase (inputSchema). The Anthropic Messages API uses snake_case (input_schema). Pass it through unchanged in the example above, but the moment you reshape it for any reason, you will trip on this.

Letting the loop run forever. Always cap the turn count. A model that cannot figure out which tool to call will keep picking the wrong one. Eight turns is a sane default for a single user question. If you find yourself wanting more, the question is usually too vague, not the loop too short.

Mixing transport types without thinking about it. Local servers over stdio are simple. Remote servers use the streamable HTTP transport and need their own client. mcp.client.streamable_http.streamable_http_client is the entry point. Do not try to bridge them at the session level; do it at the dispatcher level (one function per transport, same interface back to the loop).

Assuming the tool description is optional. It is technically optional in the schema. The model treats it as required in practice. A tool with an empty description is a tool the model will not pick, or will pick for the wrong reason. Spend 30 seconds writing a real sentence.

When to Use This Pattern

Use the multi-server client when the agent's toolset crosses a domain boundary: data, files, communication, code execution. The protocol is doing real work here. Each server is owned by a different team, written in a different language, deployed on a different schedule. MCP lets you swap any of them without touching the others.

Skip it for a single tool. A 150-line client for one server is more code than a direct function call would be. The break-even point is roughly two servers, and it keeps paying off the more you add.

If you want the loop, the transport handling, the permission model, and the subagent API taken care of for you, look at the Claude Agent SDK. It speaks MCP natively and lets you declare multiple servers in ClaudeAgentOptions. You give up the explicit async for loop, and you get back about 400 lines of code you do not have to maintain.

Build an MCP Client That Connects to Multiple Servers in Python

Prerequisites

Step 1: Scaffold Two Small Servers

Step 2: Connect to Both Servers in One Client

Step 3: Drive the Tool Loop With Claude

Step 4: Run It

Common Pitfalls

When to Use This Pattern

Further Reading

Need Help Implementing This?

Build an MCP Client That Connects to Multiple Servers in Python

Prerequisites

Step 1: Scaffold Two Small Servers

Step 2: Connect to Both Servers in One Client

Step 3: Drive the Tool Loop With Claude

Step 4: Run It

Common Pitfalls

When to Use This Pattern

Further Reading

Need Help Implementing This?

Get DevOps Insights Straight to Your Inbox