DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI
  • Revolutionizing Scaled Agile Frameworks with AI, MuleSoft, and AWS: An Insider’s Perspective
  • AWS Bedrock: The Future of Enterprise AI
  • Unlocking the Potential: Integrating AI-Driven Insights with MuleSoft and AWS for Scalable Enterprise Solutions

Trending

  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  • A Deep Dive into Tracing Agentic Workflows (Part 1)
  • A Walk-Through of the DZone Article Editor
  • Optimizing High-Volume REST APIs Using Redis Caching and Spring Boot (With Load Testing Code)
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Embedding Store as a Platform on AWS: OpenSearch + Bedrock + S3 Needs SLAs, Governance, and Quotas

Embedding Store as a Platform on AWS: OpenSearch + Bedrock + S3 Needs SLAs, Governance, and Quotas

Vector search is not "just OpenSearch." It just needs to be run as a platform with SLAs, governance, and quotas to control drift, leaks, and out-of-control costs.

By 
Anusha Kovi user avatar
Anusha Kovi
DZone Core CORE ·
Feb. 19, 26 · Analysis
Likes (1)
Comment
Save
Tweet
Share
1.2K Views

Join the DZone community and get the full member experience.

Join For Free

When teams say they are building RAG, they often mean they are adding a vector database. On AWS, this typically looks like using S3 for documents, Amazon Bedrock to generate embeddings, and Amazon OpenSearch for vector search.

It functions when you set it up, embed a few thousand chunks, and perform a similarity search, and it works. Everyone is happy!

After that, it scales, and OpenSearch being down isn't the primary issue. The problems are more subtle and more expensive. 

  • Retrieval quality deteriorates after a model or chunking change, and it often goes unnoticed until users complain.
  • Permissions become 'we will filter later' as teams begin to dump all datasets into the same index.
  • Embedding pipelines are rerun accidentally, and your embedding bill spikes.
  • Stale vectors never go away, and storage expands without a lifecycle.
  • You are unable to respond to simple questions like 'where did this vector come from?' or 'who was allowed to retrieve it?'

At that point, you realize that selecting OpenSearch over another vector database isn't the difficult part. The challenging aspect is that vector data behaves more like a product than a table. It requires economic guardrails, safety controls, and operational guarantees.

Thus, the thesis is as follows:

Instead of viewing your embedding store as 'just a vector database,' consider it a platform with SLAs, governance, and quotas.

This article walks through what that platform looks like, with a focus on OpenSearch, Bedrock embeddings, and S3 documents.

Why Vector Data Needs 'Platform Thinking'

Vector data systems have the properties that normal data systems lack. Some of them are:

1. Embeddings Are Model Artifacts and Not Facts

Unlike price = 12.99, a vector is not a constant value. It is a result of:

  • Model + version embedding
  • Chunking strategy
  • Preprocessing steps (boilerplate removal, PII redaction, and HTML stripping)
  • Assumptions about distance metrics
  • Metadata filters used during the query

Your retrieval behavior may change significantly if you alter any of those, even if the document remains unchanged.

2. Quality Can Deteriorate Silently

Results are still returned by the system. The assistant continues to respond. However, the responses are worse. This is the most dangerous failure mode: it's not an outage but a lack of reliability.

3. Multiple Tenants Just Happen

One team starts. A dataset is added by a different team. Soon, you'll have competing latency requirements, different needs for privacy, different expectations for retention, drastically disparate query volumes, and distinct criteria for 'how accurate is accurate enough.'

The embedding store turns into a shared resource with no rules if there are no clear platform controls.

4. Security Is a Must

You are now just one bad filter away from leakage if you index content that users shouldn't see. Additionally, you are unable to look into incidents if the provenance cannot be proven.

The Three Promises Of Embedding Store Platform

  1. Reliability guarantees (SLAs/SLOs)
  2. Safety guarantees (governance, access control, auditability)
  3. Economics guarantees (quotas, cost attribution, guardrails)

Let's map each to OpenSearch design choices and AWS primitives.

Service Level Agreements (SLAs)

The majority of teams solely track OpenSearch uptime, which is not enough.

There are two critical paths for embeddings:

  • Ingest path: S3 doc changes -> chunk -> embed (Bedrock) -> index (OpenSearch) -> retrievable
  • Query path: user query -> embed -> retrieve (OpenSearch) -> (optional rerank) -> return context

Service level objectives (SLOs) for your platform should cover three categories:

1. Tail Behavior and Query Latency

  • P95 and P99 latency for requests involving vector searches
  • P99 latency at periods of high traffic (typically the problem) 

The effectiveness of OpenSearch vector searches frequently depends on index configuration, shared strategy, dimensions of a vector, filters, top-K, and approximate parameters for the search. These are standardized via a platform approach, preventing each team from reinventing risky settings.

2. Freshness (The Real Business SLA)

Freshness is the real SLA if your RAG promises to reflect new docs quickly. Describe a freshness SLO as follows:

"Within 15 minutes, 95% of modified documents can be retrieved."

To measure it, keep track of the arrival timing of the S3 event, the time chunking begin time, the embedding finish time, the indexing finish time, and the first successful searchable time.

3. A Minimal Quality SLO (Yup, You Require One)

A perfect research-grade evaluation is not necessary. You require a steady signal.

For instance:

  • Recall@K on a small set, such as 100–300 carefully chosen query to doc pairings
  • Empty retrieval rate (percentage of queries that return irrelevant or low-similarity chunks)
  • Top-1 stability across releases (the frequency of unexpected changes in the top result)

Preventing 'we shipped a change and retrieval got worse' from 'being discovered in production' is your goal.

Governance: Provenance, Policy Enforcement, And Auditability

The most common governance error with S3 and OpenSearch is:

'We'll implement access control within the application.'

Under conditions of scale, different teams, and changing regulations, it often fails. A platform should enforce governance at two points:

  • Before indexing: What is permitted to access the embedding store
  • At retrieval time: Who can retrieve what?

1. Provenance: A Birth Certificate Is Required for Each Vector

Store metadata like the below for every chunk that OpenSearch has indexed.

  • s3_uri (or key/bucket)
  • doc_id (stable ID)
  • doc_version (content hash, last_modified, and ETag)
  • chunk_id, chunk_start, and chunk_end
  • embedding_model_id (Bedrock model identifier)
  • embedding_version (the label for your internal version)
  • chunking_version (the version label for your chunking configuration)
  • preprocess_flags (html_stripped, boilerplate_removed, pii_redacted)
  • domain, tenant_id
  • policy_tags (confidential, export-controlled, internal, etc.)
  • pipeline_run_id, created_at

This metadata is not 'nice to have.' It makes it possible for selective rebuilds (just re-embed the impacted documents), incident response, describing the reasons for the shift in outcomes, and retention enforcement (TTL - time to live and deletion correctness).

2. Access Control: Apply Rules at the Time of Retrieval Rather Than Later

In terms of AWS, there are often two layers:

  1. Access to S3 (source truth)
    • IAM, KMS encryption, bucket policies, and ideally ABAC (attribute-based access control) tags
    • guarantees the protection of raw documents
  2. Access to OpenSearch retrieval (serving layer)

You want to make sure that a query cannot return chunks that are not authorized for the user.

Practical strategies for OpenSearch:

  • Tenant's hard partition
    • separate indices for each tenant, or at the very least,
    • separate index patterns
    • easiest and most secure for strict isolation
  • Enforced metadata filters in a soft multi-tenant system
    • single index, but tenant_id, clearance, etc., are present in each document
    • Mandatory filters are injected server-side by your platform query service
    • Don't rely on filters provided by clients
  • Fine-grained access control (where possible)
    • enforce role-based access and index permissions
    • If at all possible, refrain from granting downstream apps with raw OpenSearch access

A solid rule: If a user can never see a document, don’t index it into a shared space where it could be retrieved accidentally.

3. Auditability: Treat Retrieval as a Regulated Action

Your audit record should capture:

  • Principal identity (tenant, role, user/service)
  • Query ID and timestamp
  • Model/version of embedding used
  • OpenSearch index and search parameters
  • Required filters applied
  • Returned top-K chunk IDs and similarity ratings
  • Associated S3 URIs and document IDs
  • Downstream 'answer ID' is optional if end-to-end traceability is desired

With this, 'RAG said something weird' can be debugged.

Quotas: Keeping Embedding Costs And Shared Infra Sane

If you don't enforce quotas, the embedding store becomes a shared credit card.

The biggest cost drivers in this stack are Bedrock embedding calls (or compute if self-hosted), OpenSearch storage growth and index maintenance, heavy top-K + filtering + high QPS retrieval, and optional reranking. Both the intake and query pathways should have platform quotas.

1. Ingest Quotas (Stop Storms From Re-Embedding)

For instance:

  • Maximum documents per tenant per day
  • Maximum tokens per tenant per day
  • Maximum embeddings per tenant per day
  • Maximum frequency of re-embedding (e.g., once daily unless authorized)
  • Maximum number of chunks per document (controls runaway chunking)

2. Query Quotas (Manage Serving Expenses)

For instance:

  • Maximum QPS for each tenant
  • Maximum K (top-K) for each request
  • Maximum number of filtered facets
  • Graceful backpressure and burst limits

3. Cost Attribution (Showback)

You should be able to report even if you never "charge" teams:

  • Embeddings generated daily for each tenant
  • Rise in storage per tenant
  • Queries per tenant plus the average top-K
  • Estimated monthly rent for each tenant

Teams behave more rationally when they are able to see their footprint.

Reference Architecture: Embedding Store Platform On AWS

The simplest way to stop 'everyone talks to OpenSearch directly' is to introduce a platform boundary. The core components are:

1. Document Intake

  • S3 is the record system.
  • Changes are detected by S3 events (or scheduled scans).
  • Text is extracted, chunked, and metadata is prepared using a pipeline.

2. Pipeline for Embedding and Indexing

  • Chunk -> Bedrock embeddings -> OpenSearch indexing.
  • Store provenance information alongside vectors.
  • For traceability, write pipeline run logs.

3. Query service

  • Accepts user query.
  • Calls the query vector's Bedrock embeddings.
  • Injects mandatory policy filters.
  • OpenSearch queries.
  • Returns chunks, citations, and metadata.

4. Platform Controls

  • Authentication (Cognito, IAM, etc.).
  • Authorization and entitlements (tenant mapping, ABAC tags).
  • Rate caps and quotas (per tenant).
  • Observability and audit records.
  • Jobs for quality assessment and release gates.

A helpful way to put it: 'OpenSearch access' should not be granted to teams. They should get Embedding Platform access.

How This Appears In OpenSearch: Indexing Strategy

An example of a typical chunk document stored in OpenSearch looks like

  • vector: the embedding
  • text: the chunk text (often saved elsewhere, occasionally optional)
  • chunk_id, doc_id
  • doc_version, s3_uri
  • department, tenant_id, policy_tags, and region
  • created_at, chunking_version, embedding_version

The crucial patterns of design:

  • Add filters to metadata fields so that OpenSearch can effectively filter them.
  • To implement platform-wide policies, maintain a uniform schema across domains.

Failure Modes This Platform Prevents 

1. 'Search Got Worse After We Upgraded Embeddings'

You discover this in production if versioning and evaluation gates are not used.

Using this platform:

  • A canary index receives the updated embedding version
  • Conduct golden set testing
  • Compare stability and Recall@K
  • promote only if it succeeds

2. 'Everything Was Re-embedded Due to a Pipeline Bug'

Your budget and throughput are melted in the absence of quotas.

Using this platform:

  • Throttle token/day ingest quotas
  • An explicit admin workflow is necessary for massive rebuilds.
  • Tenants can be paused without pausing everyone.

3. 'We Leaked Information Among Teams'

It's just one missing parameter away in the absence of enforced filters.

Using this platform:

  • The query service injects policy filters on the server side.
  • Tenant isolation is structural (separate namespaces or indices).
  • Audit logs reveal exactly what took place.

4. 'Storage Never Stops Expanding'

Using this platform:

  • There are TTL policies and lifespan guidelines.
  • Stale documents are found and removed.
  • You can set limits and report storage growth for each tenant.

Conclusion

OpenSearch has the potential to be a great vector engine. Bedrock is capable of generating embeddings reliably. Your truth can be stored on S3. However, those elements by themselves will not determine the effectiveness of your RAG system. It will be decided by whether you built the missing layer: SLAs for freshness and reliability, governance to enable forensics and prevent leaks, quotas to maintain the system's sustainability.

You can do more than just "RAG that works" once you approach the embedding store as a platform. You get a shared foundation where multiple teams can build safely without turning embeddings into your next ungoverned data swamp.

AWS AI

Opinions expressed by DZone contributors are their own.

Related

  • The Agent Protocol Stack: MCP vs. A2A vs. AG-UI
  • Revolutionizing Scaled Agile Frameworks with AI, MuleSoft, and AWS: An Insider’s Perspective
  • AWS Bedrock: The Future of Enterprise AI
  • Unlocking the Potential: Integrating AI-Driven Insights with MuleSoft and AWS for Scalable Enterprise Solutions

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook