Building a Production-Ready MCP Server in Python
Develop a production-grade MCP server in Python with JWT auth, scope-based governance, rate limits, and OTEL tracing for both STDIO and HTTP modes.
Join the DZone community and get the full member experience.
Join For FreeThe Model Context Protocol (MCP) is rapidly emerging as a fundamental framework for secure AI integration. It effectively links large language models (LLMs) with essential corporate assets, such as APIs, databases, and services. However, moving from concept to production requires addressing several key real-world demands:
- Governance: Defining clear rules regarding who is authorized to access specific tools
- Security: Implementing robust practices for managing and protecting tokens and secrets
- Resilience: Ensuring system stability and performance during high-demand periods or in the face of malicious attacks
- Observability: Establishing the capability to effectively diagnose and troubleshoot failures across various tools and user environments
In this article, we'll focus on these points and upgrade a simple MCP server into a production-grade, robust system. We'll build:
- An MCP server (stdio) for Claude Desktop and MCP Inspector
- A reusable governance layer (scopes + rate limits)
- OpenTelemetry tracing
- An HTTP test gateway for verifying role-based access control (RBAC) and quotas using cURL
- A complete runtime that you can deploy or expand upon for your own tools
Architecture Overview
The system architecture consists of several key components, each with a specific role:
- MCP Server – Manages the registration and execution of tools and resources
- FastAPI Gateway – An optional external entry point for remote clients, providing JSON Web Token (JWT) verification and rate-limit enforcement
- Governance Decorators – Applies and enforces essential controls such as per-tool scopes, quotas, and audit context
- Observability Layer – Utilizes OpenTelemetry to trace every call, with output directed to standard output (stdout) or the OpenTelemetry Protocol (OTLP)
- Transport Mode – Defines the communication protocol, supporting standard input and output (stdio) for the Claude Desktop application or HTTP for remote clients

Environment Setup
To set up the environment, run the following command in your shell:
pip3 install mcp fastapi "uvicorn[standard]" pyjwt opentelemetry-sdk opentelemetry-api
Full Implementation in a Single File
Below is the complete implementation that you can run immediately. It includes:
- FastMCP server (stdio)
- JWT authentication (HTTP gateway)
- Scope-based RBAC
- Per-tenant rate limiting
- OpenTelemetry tracing
- Shared TOOLS registry
"""
mcp_demo.py
Production-leaning MCP server
- FastMCP-based MCP server (stdio) for Claude Desktop / MCP Inspector
- JWT authentication + scope-based RBAC for HTTP gateway
- Simple per-tenant+tool rate limiter
- OpenTelemetry tracing to console
Install deps:
pip install mcp fastapi "uvicorn[standard]" pyjwt \
opentelemetry-sdk opentelemetry-api
"""
import time
from functools import wraps
from typing import Any, Callable, Dict, Optional
import jwt
from fastapi import Depends, FastAPI, Header, HTTPException
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import ConsoleSpanExporter, SimpleSpanProcessor
import uvicorn
from mcp.server.fastmcp import FastMCP
# ------------------------------------------------------------------------------
# Observability (OpenTelemetry → console)
# ------------------------------------------------------------------------------
def setup_tracing(use_console: bool):
provider = TracerProvider(
resource=Resource.create({"service.name": "mcp-python-server"})
)
if use_console:
provider.add_span_processor(SimpleSpanProcessor(ConsoleSpanExporter()))
trace.set_tracer_provider(provider)
return trace.get_tracer("mcp-python-server")
# ------------------------------------------------------------------------------
# Governance: scopes + rate-limit
# ------------------------------------------------------------------------------
_BUCKETS: dict[tuple[str, str], dict[str, float]] = {}
def rate_limit(calls: int, per_seconds: float):
"""Simple token-bucket rate limiter per (tenant, tool)."""
def deco(fn: Callable):
@wraps(fn)
def wrapper(
*args,
tenant_id: str = "public",
tool_name: str = "",
**kwargs,
):
key = (tenant_id, tool_name or fn.__name__)
bucket = _BUCKETS.setdefault(
key, {"tokens": calls, "ts": time.time()}
)
now = time.time()
# Refill tokens
bucket["tokens"] = min(
calls,
bucket["tokens"] + (now - bucket["ts"]) * (calls / per_seconds),
)
bucket["ts"] = now
if bucket["tokens"] < 1:
raise RuntimeError(f"Rate limit exceeded for {key}")
bucket["tokens"] -= 1
with tracer.start_as_current_span(
f"rate_limit.{tool_name or fn.__name__}"
):
return fn(*args, tenant_id=tenant_id, **kwargs)
return wrapper
return deco
def require_scopes(*needed: str):
"""
Scope-based RBAC decorator.
- If granted_scopes is None (stdio mode / trusted local host),
we allow everything (no RBAC).
- If granted_scopes is provided (HTTP gateway w/ JWT), we enforce it.
"""
def deco(fn: Callable):
@wraps(fn)
def wrapper(
*args,
granted_scopes: Optional[list[str]] = None,
**kwargs,
):
if granted_scopes is None:
# Local/stdio mode: skip RBAC
return fn(*args, **kwargs)
granted = set(granted_scopes)
missing = [s for s in needed if s not in granted]
if missing:
raise PermissionError(f"Missing scopes: {missing}")
return fn(*args, granted_scopes=granted_scopes, **kwargs)
return wrapper
return deco
# ------------------------------------------------------------------------------
# Auth: JWT helpers (used by HTTP gateway)
# ------------------------------------------------------------------------------
JWT_SECRET = "replace-me-in-prod"
JWT_ALG = "HS256"
def decode_jwt(token: str) -> Dict[str, Any]:
return jwt.decode(token, JWT_SECRET, algorithms=[JWT_ALG])
def mint_demo_jwt(
sub: str = "user-123",
tenant: str = "acme",
scopes: Optional[list[str]] = None,
) -> str:
scopes = scopes or ["customer:read", "orders:search"]
payload = {
"sub": sub,
"tenant": tenant,
"scopes": scopes,
"exp": int(time.time()) + 3600,
}
return jwt.encode(payload, JWT_SECRET, algorithm=JWT_ALG)
# ------------------------------------------------------------------------------
# MCP server (FastMCP) – tools & resources
# ------------------------------------------------------------------------------
mcp = FastMCP("Python MCP Demo")
# Simple registry for HTTP gateway → names → functions
TOOLS: Dict[str, Callable[..., Any]] = {}
@mcp.tool()
@require_scopes("customer:read")
@rate_limit(calls=5, per_seconds=10)
def get_customer(
customer_id: str,
tenant_id: str = "public",
granted_scopes: Optional[list[str]] = None,
) -> dict:
"""Return a sanitized customer profile (read-only)."""
with tracer.start_as_current_span("tool.get_customer") as span:
span.set_attribute("tenant.id", tenant_id)
span.set_attribute("customer.id", customer_id)
return {"id": customer_id, "name": "Jane Doe", "tier": "gold"}
TOOLS["get_customer"] = get_customer # register for HTTP
@mcp.tool()
@require_scopes("orders:search")
@rate_limit(calls=10, per_seconds=10)
def find_orders(
query: str,
tenant_id: str = "public",
granted_scopes: Optional[list[str]] = None,
) -> list[dict]:
"""Search orders via a safe path (no raw SQL from the model)."""
with tracer.start_as_current_span("tool.find_orders") as span:
span.set_attribute("tenant.id", tenant_id)
span.set_attribute("query", query[:64])
return [{"order_id": "o-123", "status": "shipped"}]
TOOLS["find_orders"] = find_orders # register for HTTP
@mcp.resource("customers://{customer_id}")
def customer_resource(customer_id: str) -> str:
"""Simple resource example that could be expanded to DB/HTTP fetch."""
return (
f'{{"doc": "Customer {customer_id} resource placeholder"}}'
)
# ------------------------------------------------------------------------------
# HTTP gateway (for testing JWT + quotas; separate from MCP transport)
# ------------------------------------------------------------------------------
app = FastAPI(title="MCP Demo Gateway")
def auth(
Authorization: Optional[str] = Header(default=None),
) -> Dict[str, Any]:
if not Authorization or not Authorization.startswith("Bearer "):
raise HTTPException(status_code=401, detail="Missing bearer token")
token = Authorization.split()[1]
try:
return decode_jwt(token)
except Exception as exc: # noqa: BLE001
raise HTTPException(status_code=401, detail=f"Invalid token: {exc}") from exc
@app.get("/mint-demo-jwt")
def mint_token() -> Dict[str, str]:
"""Convenience helper: get a working JWT for local curl tests."""
return {"token": mint_demo_jwt()}
@app.post("/mcp/tool/{name}")
def call_tool_http(
name: str,
payload: dict,
ctx: Dict[str, Any] = Depends(auth),
):
"""
HTTP entry point that reuses the same Python functions as MCP tools.
NOTE:
- This is NOT MCP-over-HTTP; it's a convenience gateway so you can
see JWT scopes + rate limiting in action with curl/Postman.
"""
fn = TOOLS.get(name)
if fn is None:
raise HTTPException(status_code=404, detail="Unknown tool")
tenant_id = ctx.get("tenant", "public")
scopes = ctx.get("scopes", [])
# Inject governance context
payload.setdefault("tenant_id", tenant_id)
payload.setdefault("granted_scopes", scopes)
try:
result = fn(**payload)
return {"ok": True, "result": result}
except PermissionError as exc: # from require_scopes
raise HTTPException(status_code=403, detail=str(exc)) from exc
except RuntimeError as exc: # from rate_limit
raise HTTPException(status_code=429, detail=str(exc)) from exc
except TypeError as exc:
raise HTTPException(status_code=400, detail=f"Bad args: {exc}") from exc
def run_http(port: int = 8080) -> None:
uvicorn.run(app, host="0.0.0.0", port=port)
# ------------------------------------------------------------------------------
# Entry points: stdio (MCP) & HTTP (testing)
# ------------------------------------------------------------------------------
def run_stdio():
"""
Run FastMCP over stdio.
This is what Claude Desktop / MCP Inspector will use.
"""
# FastMCP handles stdio transport internally
mcp.run(transport="stdio")
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser()
parser.add_argument(
"--mode",
choices=["stdio", "http"],
default="http",
help="Run as stdio MCP server or HTTP gateway",
)
parser.add_argument("--port", type=int, default=8080)
args = parser.parse_args()
if args.mode == "stdio":
tracer = setup_tracing(use_console=False)
run_stdio()
else:
tracer = setup_tracing(use_console=True)
run_http(args.port)
Running the Application in HTTP Mode (JWT, RBAC, Rate Limit Testing)
To start, run the following command:
python mcp_demo.py --mode http
Happy Path
Use the following command to obtain a JWT Token:
curl -s http://localhost:8080/mint-demo-jwt | jq -r .token > token.txt
TKN=$(cat token.txt)
Now, you can make an authorized call using the token:
curl -s -X POST http://localhost:8080/mcp/tool/get_customer \
-H "Authorization: Bearer $TKN" \
-H "Content-Type: application/json" \
-d '{"customer_id":"c-42"}'
You should receive the following JSON response:
{"ok":true,"result":{"id":"c-42","name":"Jane Doe","tier":"gold"}}
In the server console, you will also see OpenTelemetry printing spans:

RBAC Denial (Missing Scope)
To test the RBAC denial, create a token without customer:read and try again:
# Token with only orders:search
python - <<'PY'
import jwt, time
print(jwt.encode(
{"sub":"user-999","tenant":"acme","scopes":["orders:search"],"exp":int(time.time())+3600},
"replace-me-in-prod", algorithm="HS256"))
PY
# copy to $WEAK
WEAK=<generated token above>
curl -i -X POST "http://localhost:8080/mcp/tool/get_customer" \
-H "Authorization: Bearer $WEAK" \
-H "Content-Type: application/json" \
-d '{"customer_id":"c-1"}'
You should receive the following JSON response:
{"detail":"Missing scopes: ['customer:read']"}
Rate-Limit in Action
To test the rate limit, call get_customer more than 5 times within ~10 seconds:
for i in $(seq 1 7); do
curl -s -o /dev/null -w "%{http_code}\n" -X POST "http://localhost:8080/mcp/tool/get_customer" \
-H "Authorization: Bearer $TKN" -H "Content-Type: application/json" \
-d '{"customer_id":"c-rl"}'
done
Expected Output:
You should see some responses with 200, followed by 429. This confirms that the quota enforcement is working per (tenant, tool).
Running the Application in STDIO Mode (Claude Desktop)
To run the server in stdio mode, execute:
python mcp_demo.py --mode stdio
Configure Claude Desktop to Launch the Same Command:
- Install Claude Desktop
- Open the
claude_desktop_config.jsonfile under Settings -> Developer -> Edit Config - Replace it with the following JSON:
{
"mcpServers": {
"my-python-mcp": {
"command": "python",
"args": ["/ABSOLUTE/PATH/mcp_demo.py", "--mode=stdio"],
}
}
}
- Save the file and restart Claude Desktop
Test Inside Claude
Open a chat and request it to use a tool, for example: "Call the get_customer MCP tool with id c-42."

The Claude Desktop should now be able to communicate with the MCP tool.
Running the Application in STDIO Mode (MCP Inspector)
If you have uv installed, run uv run mcp dev mcp_demo.py. Alternatively, you can run the MCP inspector with Python (without uv).
npx -y @modelcontextprotocol/inspector python mcp_demo.py --mode stdio

What Makes This "Production-Ready"
Security
- JWT authentication
- Scope-based RBAC for each tool
- Least-privilege design principles
- No raw SQL or direct OS access from tools
- Utilizes secrets from environment (transition to Vault/KMS in production)
Governance
- Per-tenant, per-tool rate limiting
- Centralized decorators for policy enforcement
- Deterministic error codes (403 for Forbidden, 429 for Too Many Requests)
Observability
- OpenTelemetry spans for:
- Tool execution
- Tenant ID
- Latency
- Errors
- Rate-limit events
Reliability
- Structured failure handling
- Reusable shared tool logic (stdio + HTTP)
- Compatibility with Docker, Kubernetes, AWS Lambda
Developer Experience
- Stdio mode for Claude/Desktop workflows
- HTTP gateway for integration tests using Postman or cURL
- Fully testable Continuous Integration (CI) suite
Key Takeaways
- MCP becomes powerful when combined with production-grade abstractions such as authentication, RBAC, and observability.
- FastMCP simplifies the exposure of Python tools via stdio for Claude Desktop.
- A custom TOOLS registry enables testing controls through HTTP.
- OpenTelemetry provides distributed tracing across tool interactions.
- Rate limiting and RBAC transform tools into governable and auditable enterprise APIs.
Opinions expressed by DZone contributors are their own.
Comments