DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • Automating Threat Detection Using Python, Kafka, and Real-Time Log Processing
  • Beyond Django and Flask: How FastAPI Became Python's Fastest-Growing Framework for Production APIs
  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps

Trending

  • Observability in Spring Boot 4
  • AI Agents in Java: Architecting Intelligent Health Data Systems
  • AWS Managed Database Observability: Monitoring DynamoDB, ElastiCache, and Redshift Beyond CloudWatch
  • RAG Done Right: When to Use SQL, Search, and Vector Retrieval and How To Combine Them
  1. DZone
  2. Coding
  3. Languages
  4. Building a Production-Ready MCP Server in Python

Building a Production-Ready MCP Server in Python

Develop a production-grade MCP server in Python with JWT auth, scope-based governance, rate limits, and OTEL tracing for both STDIO and HTTP modes.

By 
Nabin Debnath user avatar
Nabin Debnath
·
Dec. 01, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

The Model Context Protocol (MCP) is rapidly emerging as a fundamental framework for secure AI integration. It effectively links large language models (LLMs) with essential corporate assets, such as APIs, databases, and services. However, moving from concept to production requires addressing several key real-world demands:

  • Governance: Defining clear rules regarding who is authorized to access specific tools
  • Security: Implementing robust practices for managing and protecting tokens and secrets
  • Resilience: Ensuring system stability and performance during high-demand periods or in the face of malicious attacks
  • Observability: Establishing the capability to effectively diagnose and troubleshoot failures across various tools and user environments

In this article, we'll focus on these points and upgrade a simple MCP server into a production-grade, robust system. We'll build:

  • An MCP server (stdio) for Claude Desktop and MCP Inspector
  • A reusable governance layer (scopes + rate limits)
  • OpenTelemetry tracing
  • An HTTP test gateway for verifying role-based access control (RBAC) and quotas using cURL
  • A complete runtime that you can deploy or expand upon for your own tools


Architecture Overview

The system architecture consists of several key components, each with a specific role: 

  • MCP Server – Manages the registration and execution of tools and resources
  • FastAPI Gateway – An optional external entry point for remote clients, providing JSON Web Token (JWT) verification and rate-limit enforcement
  • Governance Decorators – Applies and enforces essential controls such as per-tool scopes, quotas, and audit context
  • Observability Layer – Utilizes OpenTelemetry to trace every call, with output directed to standard output (stdout) or the OpenTelemetry Protocol (OTLP)
  • Transport Mode – Defines the communication protocol, supporting standard input and output (stdio) for the Claude Desktop application or HTTP for remote clients

System Architecture Key Components


Environment Setup

To set up the environment, run the following command in your shell:

Shell
 
pip3 install mcp fastapi "uvicorn[standard]" pyjwt opentelemetry-sdk opentelemetry-api


Full Implementation in a Single File

Below is the complete implementation that you can run immediately. It includes:

  • FastMCP server (stdio)
  • JWT authentication (HTTP gateway)
  • Scope-based RBAC
  • Per-tenant rate limiting
  • OpenTelemetry tracing
  • Shared TOOLS registry
Python
 
"""
mcp_demo.py

Production-leaning MCP server 

- FastMCP-based MCP server (stdio) for Claude Desktop / MCP Inspector
- JWT authentication + scope-based RBAC for HTTP gateway
- Simple per-tenant+tool rate limiter
- OpenTelemetry tracing to console

Install deps:
    pip install mcp fastapi "uvicorn[standard]" pyjwt \
                opentelemetry-sdk opentelemetry-api
"""

import time
from functools import wraps
from typing import Any, Callable, Dict, Optional

import jwt
from fastapi import Depends, FastAPI, Header, HTTPException
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
import uvicorn

from mcp.server.fastmcp import FastMCP

# ------------------------------------------------------------------------------
# Observability (OpenTelemetry → console)
# ------------------------------------------------------------------------------

def setup_tracing(use_console: bool):
    provider = TracerProvider(
        resource=Resource.create({"service.name": "mcp-python-server"})
    )
    if use_console:
        provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
    trace.set_tracer_provider(provider)
    return trace.get_tracer("mcp-python-server")


# ------------------------------------------------------------------------------
# Governance: scopes + rate-limit
# ------------------------------------------------------------------------------

_BUCKETS: dict[tuple[str, str], dict[str, float]] = {}


def rate_limit(calls: int, per_seconds: float):
    """Simple token-bucket rate limiter per (tenant, tool)."""

    def deco(fn: Callable):
        @wraps(fn)
        def wrapper(
            *args,
            tenant_id: str = "public",
            tool_name: str = "",
            **kwargs,
        ):
            key = (tenant_id, tool_name or fn.__name__)
            bucket = _BUCKETS.setdefault(
                key, {"tokens": calls, "ts": time.time()}
            )
            now = time.time()

            # Refill tokens
            bucket["tokens"] = min(
                calls,
                bucket["tokens"] + (now - bucket["ts"]) * (calls / per_seconds),
            )
            bucket["ts"] = now

            if bucket["tokens"] < 1:
                raise RuntimeError(f"Rate limit exceeded for {key}")

            bucket["tokens"] -= 1

            with tracer.start_as_current_span(
                f"rate_limit.{tool_name or fn.__name__}"
            ):
                return fn(*args, tenant_id=tenant_id, **kwargs)

        return wrapper

    return deco


def require_scopes(*needed: str):
    """
    Scope-based RBAC decorator.

    - If granted_scopes is None (stdio mode / trusted local host),
      we allow everything (no RBAC).
    - If granted_scopes is provided (HTTP gateway w/ JWT), we enforce it.
    """

    def deco(fn: Callable):
        @wraps(fn)
        def wrapper(
            *args,
            granted_scopes: Optional[list[str]] = None,
            **kwargs,
        ):
            if granted_scopes is None:
                # Local/stdio mode: skip RBAC
                return fn(*args, **kwargs)

            granted = set(granted_scopes)
            missing = [s for s in needed if s not in granted]
            if missing:
                raise PermissionError(f"Missing scopes: {missing}")
            return fn(*args, granted_scopes=granted_scopes, **kwargs)

        return wrapper

    return deco


# ------------------------------------------------------------------------------
# Auth: JWT helpers (used by HTTP gateway)
# ------------------------------------------------------------------------------

JWT_SECRET = "replace-me-in-prod"
JWT_ALG = "HS256"


def decode_jwt(token: str) -> Dict[str, Any]:
    return jwt.decode(token, JWT_SECRET, algorithms=[JWT_ALG])


def mint_demo_jwt(
    sub: str = "user-123",
    tenant: str = "acme",
    scopes: Optional[list[str]] = None,
) -> str:
    scopes = scopes or ["customer:read", "orders:search"]
    payload = {
        "sub": sub,
        "tenant": tenant,
        "scopes": scopes,
        "exp": int(time.time()) + 3600,
    }
    return jwt.encode(payload, JWT_SECRET, algorithm=JWT_ALG)


# ------------------------------------------------------------------------------
# MCP server (FastMCP) – tools & resources
# ------------------------------------------------------------------------------

mcp = FastMCP("Python MCP Demo")

# Simple registry for HTTP gateway → names → functions
TOOLS: Dict[str, Callable[..., Any]] = {}


@mcp.tool()
@require_scopes("customer:read")
@rate_limit(calls=5, per_seconds=10)
def get_customer(
    customer_id: str,
    tenant_id: str = "public",
    granted_scopes: Optional[list[str]] = None,
) -> dict:
    """Return a sanitized customer profile (read-only)."""
    with tracer.start_as_current_span("tool.get_customer") as span:
        span.set_attribute("tenant.id", tenant_id)
        span.set_attribute("customer.id", customer_id)
        return {"id": customer_id, "name": "Jane Doe", "tier": "gold"}


TOOLS["get_customer"] = get_customer  # register for HTTP


@mcp.tool()
@require_scopes("orders:search")
@rate_limit(calls=10, per_seconds=10)
def find_orders(
    query: str,
    tenant_id: str = "public",
    granted_scopes: Optional[list[str]] = None,
) -> list[dict]:
    """Search orders via a safe path (no raw SQL from the model)."""
    with tracer.start_as_current_span("tool.find_orders") as span:
        span.set_attribute("tenant.id", tenant_id)
        span.set_attribute("query", query[:64])
        return [{"order_id": "o-123", "status": "shipped"}]


TOOLS["find_orders"] = find_orders  # register for HTTP


@mcp.resource("customers://{customer_id}")
def customer_resource(customer_id: str) -> str:
    """Simple resource example that could be expanded to DB/HTTP fetch."""
    return (
        f'{{"doc": "Customer {customer_id} resource placeholder"}}'
    )


# ------------------------------------------------------------------------------
# HTTP gateway (for testing JWT + quotas; separate from MCP transport)
# ------------------------------------------------------------------------------

app = FastAPI(title="MCP Demo Gateway")


def auth(
    Authorization: Optional[str] = Header(default=None),
) -> Dict[str, Any]:
    if not Authorization or not Authorization.startswith("Bearer "):
        raise HTTPException(status_code=401, detail="Missing bearer token")
    token = Authorization.split()[1]
    try:
        return decode_jwt(token)
    except Exception as exc:  # noqa: BLE001
        raise HTTPException(status_code=401, detail=f"Invalid token: {exc}") from exc


@app.get("/mint-demo-jwt")
def mint_token() -> Dict[str, str]:
    """Convenience helper: get a working JWT for local curl tests."""
    return {"token": mint_demo_jwt()}


@app.post("/mcp/tool/{name}")
def call_tool_http(
    name: str,
    payload: dict,
    ctx: Dict[str, Any] = Depends(auth),
):
    """
    HTTP entry point that reuses the same Python functions as MCP tools.

    NOTE:
    - This is NOT MCP-over-HTTP; it's a convenience gateway so you can
      see JWT scopes + rate limiting in action with curl/Postman.
    """
    fn = TOOLS.get(name)
    if fn is None:
        raise HTTPException(status_code=404, detail="Unknown tool")

    tenant_id = ctx.get("tenant", "public")
    scopes = ctx.get("scopes", [])

    # Inject governance context
    payload.setdefault("tenant_id", tenant_id)
    payload.setdefault("granted_scopes", scopes)

    try:
        result = fn(**payload)
        return {"ok": True, "result": result}
    except PermissionError as exc:  # from require_scopes
        raise HTTPException(status_code=403, detail=str(exc)) from exc
    except RuntimeError as exc:  # from rate_limit
        raise HTTPException(status_code=429, detail=str(exc)) from exc
    except TypeError as exc:
        raise HTTPException(status_code=400, detail=f"Bad args: {exc}") from exc


def run_http(port: int = 8080) -> None:
    uvicorn.run(app, host="0.0.0.0", port=port)


# ------------------------------------------------------------------------------
# Entry points: stdio (MCP) & HTTP (testing)
# ------------------------------------------------------------------------------


def run_stdio():
    """
    Run FastMCP over stdio.

    This is what Claude Desktop / MCP Inspector will use.
    """
    # FastMCP handles stdio transport internally
    mcp.run(transport="stdio")



if __name__ == "__main__":
    import argparse

    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--mode",
        choices=["stdio", "http"],
        default="http",
        help="Run as stdio MCP server or HTTP gateway",
    )
    parser.add_argument("--port", type=int, default=8080)
    args = parser.parse_args()

if args.mode == "stdio":
    tracer = setup_tracing(use_console=False)
    run_stdio()
else:
    tracer = setup_tracing(use_console=True)
    run_http(args.port)


Running the Application in HTTP Mode (JWT, RBAC, Rate Limit Testing)

To start, run the following command:

Shell
 
python mcp_demo.py --mode http


Happy Path

Use the following command to obtain a JWT Token:

Shell
 
curl -s http://localhost:8080/mint-demo-jwt | jq -r .token > token.txt
TKN=$(cat token.txt)


Now, you can make an authorized call using the token:

Shell
 
curl -s -X POST http://localhost:8080/mcp/tool/get_customer \
  -H "Authorization: Bearer $TKN" \
  -H "Content-Type: application/json" \
  -d '{"customer_id":"c-42"}'


You should receive the following JSON response:

JSON
 
{"ok":true,"result":{"id":"c-42","name":"Jane Doe","tier":"gold"}}


In the server console, you will also see OpenTelemetry printing spans:

OpenTelemetry Span Print


RBAC Denial (Missing Scope)

To test the RBAC denial, create a token without customer:read and try again:

Shell
 
# Token with only orders:search
python - <<'PY'
import jwt, time
print(jwt.encode(
  {"sub":"user-999","tenant":"acme","scopes":["orders:search"],"exp":int(time.time())+3600},
  "replace-me-in-prod", algorithm="HS256"))
PY
# copy to $WEAK
WEAK=<generated token above>

curl -i -X POST "http://localhost:8080/mcp/tool/get_customer" \
  -H "Authorization: Bearer $WEAK" \
  -H "Content-Type: application/json" \
  -d '{"customer_id":"c-1"}'


You should receive the following JSON response:

JSON
 
{"detail":"Missing scopes: ['customer:read']"}


Rate-Limit in Action

To test the rate limit, call get_customer more than 5 times within ~10 seconds:

Shell
 
for i in $(seq 1 7); do
  curl -s -o /dev/null -w "%{http_code}\n" -X POST "http://localhost:8080/mcp/tool/get_customer" \
    -H "Authorization: Bearer $TKN" -H "Content-Type: application/json" \
    -d '{"customer_id":"c-rl"}'
done


Expected Output:Expected Output


You should see some responses with 200, followed by 429. This confirms that the quota enforcement is working per (tenant, tool).


Running the Application in STDIO Mode (Claude Desktop)

To run the server in stdio mode, execute:

Shell
 
python mcp_demo.py --mode stdio


Configure Claude Desktop to Launch the Same Command: 

  • Install Claude Desktop
  • Open the claude_desktop_config.json file under Settings -> Developer -> Edit Config
  • Replace it with the following JSON:
JSON
 
{
  "mcpServers": {
    "my-python-mcp": {
      "command": "python",
      "args": ["/ABSOLUTE/PATH/mcp_demo.py", "--mode=stdio"],
    }
  }
}


  • Save the file and restart Claude Desktop

Test Inside Claude 

Open a chat and request it to use a tool, for example: "Call the get_customer MCP tool with id c-42."

Test Inside Claude


The Claude Desktop should now be able to communicate with the MCP tool.


Running the Application in STDIO Mode (MCP Inspector)

If you have uv installed, run uv run mcp dev mcp_demo.py. Alternatively, you can run the MCP inspector with Python (without uv).

Shell
 
npx -y @modelcontextprotocol/inspector python mcp_demo.py --mode stdio

MCP Inspector


What Makes This "Production-Ready"

Security

  • JWT authentication
  • Scope-based RBAC for each tool
  • Least-privilege design principles
  • No raw SQL or direct OS access from tools
  • Utilizes secrets from environment (transition to Vault/KMS in production)

Governance

  • Per-tenant, per-tool rate limiting
  • Centralized decorators for policy enforcement
  • Deterministic error codes (403 for Forbidden, 429 for Too Many Requests)

Observability

  • OpenTelemetry spans for:
    • Tool execution
    • Tenant ID
    • Latency
    • Errors
    • Rate-limit events

Reliability

  • Structured failure handling
  • Reusable shared tool logic (stdio + HTTP)
  • Compatibility with Docker, Kubernetes, AWS Lambda

Developer Experience

  • Stdio mode for Claude/Desktop workflows
  • HTTP gateway for integration tests using Postman or cURL
  • Fully testable Continuous Integration (CI) suite 


Key Takeaways

  • MCP becomes powerful when combined with production-grade abstractions such as authentication, RBAC, and observability.
  • FastMCP simplifies the exposure of Python tools via stdio for Claude Desktop.
  • A custom TOOLS registry enables testing controls through HTTP.
  • OpenTelemetry provides distributed tracing across tool interactions.
  • Rate limiting and RBAC transform tools into governable and auditable enterprise APIs.
JSON JWT (JSON Web Token) Production (computer science) Python (language) security

Opinions expressed by DZone contributors are their own.

Related

  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • Automating Threat Detection Using Python, Kafka, and Real-Time Log Processing
  • Beyond Django and Flask: How FastAPI Became Python's Fastest-Growing Framework for Production APIs
  • Securing AI/ML Workloads in the Cloud: Integrating DevSecOps with MLOps

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook