Classify requests (dashboards vs exploration/jobs), cap and prioritize concurrency, and fall back to cache/rollups so critical dashboards stay responsive during spikes.
Vector search is not "just OpenSearch." It just needs to be run as a platform with SLAs, governance, and quotas to control drift, leaks, and out-of-control costs.
Permission-aware retrieval ensures that the assistant uses only allowed information. A context graph enforces access control to prevent cross-team leakage.
Use a query router for LLM analytics — Redshift (KPIs), OpenSearch (definition), Neptune (lineage), and Cache (repeats) — to improve accuracy, latency, and costs.
Analytics assistants/chatbots should trust the semantic layer — not documents. Retrieve metric definitions, run governed SQL, and attach an audit bundle to every KPI.