DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  • Translating OData Queries to MongoDB in Java With Jamolingo
  • Beyond Django and Flask: How FastAPI Became Python's Fastest-Growing Framework for Production APIs
  • From Zero to Local AI in 10 Minutes With Ollama + Python

Trending

  • Building a High-Throughput Distributed Sequence Generator Using the Hi-Lo Algorithm
  • Testing AI-Infused Apps: A Dual-Layer Framework for AI Quality Assurance
  • Beyond Manual Annotation: Engineering Self-Correcting Pseudo-Labeling Pipelines
  • How SaaS Architectures Break at Scale — and the Engineering Decisions That Prevent It
  1. DZone
  2. Software Design and Architecture
  3. Integration
  4. Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB)

Cutting P99 Latency From ~3.2s To ~650ms in a Policy‑Driven Authorization API (Python + MongoDB)

We cut tail latency by making queries index-friendly, removing cold-start connection costs, and overlapping third‑party I/O with computation.

By 
Rohit Muthyala user avatar
Rohit Muthyala
·
Feb. 25, 26 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
1.0K Views

Join the DZone community and get the full member experience.

Join For Free

Modern authorization endpoints often do more than approve a request. They evaluate complex policies, compute rolling aggregates, call third‑party risk services, and enforce company/card limits, all under a hard latency budget. If you miss it, the transaction fails, and the failure is customer-visible.

This post walks through a practical approach to take a Python authorization API from roughly ~3.2s P99 down to ~650ms P99, using a sequence of changes that compound: query/index correctness, deterministic query planning, connection pooling and warmup, and parallelizing third‑party I/O.

Latency timeline


1. Baseline First: Measure Where The Tail Comes From

Before optimizing, capture:

  • End‑to‑end p50/p95/p99
  • A dependency breakdown (DB vs application vs third‑party)
  • Production-like load and dataset scale

Minimal timing middleware (FastAPI): 

Python
 
import time
from fastapi import FastAPI, Request

app = FastAPI()

@app.middleware("http")
async def timing_middleware(request: Request, call_next):
    start = time.perf_counter()
    resp = await call_next(request)
    elapsed_ms = (time.perf_counter() - start) * 1000
    resp.headers["Server-Timing"] = f"app;dur={elapsed_ms:.2f}"
    return resp
  


Structured spans around dependencies:

Python
 
import time
import logging

log = logging.getLogger(__name__)

class Span:
    def __init__(self, name: str):
        self.name = name
    def __enter__(self):
        self.t0 = time.perf_counter()
        return self
    def __exit__(self, exc_type, exc, tb):
        dt_ms = (time.perf_counter() - self.t0) * 1000
        log.info("span=%s duration_ms=%.2f", self.name, dt_ms)

def authorize(req):
    with Span("db.rollup"):
        rollup = compute_rollup(req)
    with Span("third_party.risk"):
        risk = call_risk(req)
    with Span("policy.eval"):
        decision = eval_policy(req, rollup, risk)
    return decision


This tells you whether P99 is dominated by slow queries, connection setup, third‑party calls, or CPU.

2. Fix Index Usage: Composite Indexes Work Left to Right

A common P99 killer: an index exists, but the query shape prevents using it efficiently.

Assume a transaction document:

JSON
 
{
  "company": "acme",
  "role": "employee_123",
  "card": "card_456",
  "transaction_date": "2026-01-25T12:00:00Z",
  "amount_cents": 1299
}


And a composite index:

Python
 
from pymongo import ASCENDING

db.transactions.create_index([
    ("company", ASCENDING),
    ("role", ASCENDING),
    ("card", ASCENDING),
    ("transaction_date", ASCENDING),
])


The Pitfall: Missing the Left-Most Field(s)

Bad (omits company):

Python
 
query = {
    "role": role_id,
    "card": card_id,
    "transaction_date": {"$gte": start, "$lt": end},
}


Better:

Python
 
query = {
    "company": company_id,
    "role": role_id,
    "card": card_id,
    "transaction_date": {"$gte": start, "$lt": end},
}


Verify With explain().

Python
 
plan = db.transactions.find(query).explain()
print(plan["queryPlanner"]["winningPlan"])


If you see COLLSCAN or a plan not using the intended index, that’s usually a direct line to tail latency.

3. Make Query Planning Deterministic (When You Must)

Even with correct indexes, intermittent P99 spikes can come from the query planner choosing a different index based on changing stats or data shape.

Detect plan variance.

Python
 
def plan_and_run(coll, query, projection=None, limit=0):
    plan = coll.find(query, projection=projection).limit(limit).explain()
    winning = plan["queryPlanner"]["winningPlan"]
    rows = list(coll.find(query, projection=projection).limit(limit))
    return winning, rows


If slow traces correlate with a different winning plan, you can:

  • Adjust indexes so the correct plan is always best, or
  • Apply a selective hint for the most latency-sensitive queries

Hint example (MongoDB):

Python
 
idx_name = "company_1_role_1_card_1_transaction_date_1"
cursor = db.transactions.find(query).hint(idx_name)


Use hints carefully: they can become wrong as the data distribution evolves. If you hint, add monitoring (e.g., periodic explain() in staging with production-like data).

4. Stop Paying Connection Setup on the Request Path: Pool + Warm

Connection setup costs can be surprisingly high (TCP/TLS/auth/discovery), and cold pods can drag P99.

Use a single global pooled client per process.

Python
 
from pymongo import MongoClient

mongo = MongoClient(
    MONGO_URI,
    maxPoolSize=200,
    minPoolSize=20,
    serverSelectionTimeoutMS=250,
    connectTimeoutMS=250,
    socketTimeoutMS=900,
)

db = mongo["payments"]


Warmup on startup (avoid cold-start penalties).

Python
 
def warmup():
    db.command("ping")
    db.transactions.find_one({"company": "__warmup__"})
    db.limits.find_one({"company": "__warmup__"})

warmup()


In Kubernetes, run warmup before the service reports readiness so traffic only hits warmed instances.

5. Parallelize Third‑Party I/O With Local/DB Work

Third‑party risk/fraud calls often dominate the tail. If you call synchronously, you idle while waiting. A better pattern is to start the call first, do independent work, then join with a strict timeout.

Threaded overlap pattern (requests + ThreadPoolExecutor).

Python
 
import requests
from concurrent.futures import ThreadPoolExecutor, TimeoutError

SESSION = requests.Session()
EXEC = ThreadPoolExecutor(max_workers=64)

def risk_call(payload, timeout_s: float = 1.1):
    resp = SESSION.post(
        "https://risk.example.com/score",
        json=payload,
        timeout=(0.2, timeout_s),  # connect, read
    )
    resp.raise_for_status()
    return resp.json()

def authorize(req):
    fut = EXEC.submit(risk_call, {"company": req.company, "amount": req.amount_cents})

    # Independent work while the network call is in flight
    rollup = compute_rollup(req)          # DB aggregation / cached rollups
    base = eval_local_policy(req, rollup) # pure CPU / local logic

    # Join with timeout and safe fallback behavior
    try:
        risk = fut.result(timeout=1.1)
    except TimeoutError:
        risk = {"status": "timeout", "score": None}
    except Exception:
        risk = {"status": "error", "score": None}

    return finalize_decision(base, risk)


This preserves a hard cap while reclaiming idle time. It’s one of the most reliable ways to reduce P99 when a dependency is volatile.

6. Prevent Regression: Make Performance A Testable Contract

Once you’ve improved P99, keep it. Add a repeatable performance test that:

  • Runs the hot path thousands of times
  • Asserts p95/p99
  • Uses a realistic dataset scale and index configuration
Python
 
import time
import statistics

def run_trials(n=3000):
    times = []
    for _ in range(n):
        t0 = time.perf_counter()
        authorize(sample_request())
        times.append((time.perf_counter() - t0) * 1000)
    times.sort()
    return {
        "p50": statistics.median(times),
        "p95": times[int(0.95 * len(times)) - 1],
        "p99": times[int(0.99 * len(times)) - 1],
        "max": max(times),
    }

print(run_trials())


Takeaways

If you’re trying to move tail latency (not just average latency), these tend to be the biggest levers:

  1. Query shape + index correctness (especially composite indexes)
  2. Stabilizing query planning (index tuning or selective hinting)
  3. Connection pooling and warmup (remove cold-start penalties)
  4. Overlapping third‑party I/O with useful work (reduce idle wait)

Apply them in that order, validate each change with P99 measurements, and you’ll get repeatable gains instead of “optimizations” that don’t show up where it matters.

API MongoDB Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Building Threat Intelligence Pipelines Using Python, APIs, and Elasticsearch
  • Translating OData Queries to MongoDB in Java With Jamolingo
  • Beyond Django and Flask: How FastAPI Became Python's Fastest-Growing Framework for Production APIs
  • From Zero to Local AI in 10 Minutes With Ollama + Python

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook