Embedding Store as a Platform on AWS: OpenSearch + Bedrock + S3 Needs SLAs, Governance, and Quotas

Vector search is not "just OpenSearch." It just needs to be run as a platform with SLAs, governance, and quotas to control drift, leaks, and out-of-control costs.

Anusha Kovi

CORE ·

Feb. 19, 26 · Analysis

Likes (1)

Comment

Save

1.4K Views

When teams say they are building RAG, they often mean they are adding a vector database. On AWS, this typically looks like using S3 for documents, Amazon Bedrock to generate embeddings, and Amazon OpenSearch for vector search.

It functions when you set it up, embed a few thousand chunks, and perform a similarity search, and it works. Everyone is happy!

After that, it scales, and OpenSearch being down isn't the primary issue. The problems are more subtle and more expensive.

Retrieval quality deteriorates after a model or chunking change, and it often goes unnoticed until users complain.
Permissions become 'we will filter later' as teams begin to dump all datasets into the same index.
Embedding pipelines are rerun accidentally, and your embedding bill spikes.
Stale vectors never go away, and storage expands without a lifecycle.
You are unable to respond to simple questions like 'where did this vector come from?' or 'who was allowed to retrieve it?'

At that point, you realize that selecting OpenSearch over another vector database isn't the difficult part. The challenging aspect is that vector data behaves more like a product than a table. It requires economic guardrails, safety controls, and operational guarantees.

Thus, the thesis is as follows:

Instead of viewing your embedding store as 'just a vector database,' consider it a platform with SLAs, governance, and quotas.

This article walks through what that platform looks like, with a focus on OpenSearch, Bedrock embeddings, and S3 documents.

Why Vector Data Needs 'Platform Thinking'

Vector data systems have the properties that normal data systems lack. Some of them are:

1. Embeddings Are Model Artifacts and Not Facts

Unlike price = 12.99, a vector is not a constant value. It is a result of:

Model + version embedding
Chunking strategy
Preprocessing steps (boilerplate removal, PII redaction, and HTML stripping)
Assumptions about distance metrics
Metadata filters used during the query

Your retrieval behavior may change significantly if you alter any of those, even if the document remains unchanged.

2. Quality Can Deteriorate Silently

Results are still returned by the system. The assistant continues to respond. However, the responses are worse. This is the most dangerous failure mode: it's not an outage but a lack of reliability.

3. Multiple Tenants Just Happen

One team starts. A dataset is added by a different team. Soon, you'll have competing latency requirements, different needs for privacy, different expectations for retention, drastically disparate query volumes, and distinct criteria for 'how accurate is accurate enough.'

The embedding store turns into a shared resource with no rules if there are no clear platform controls.

4. Security Is a Must

You are now just one bad filter away from leakage if you index content that users shouldn't see. Additionally, you are unable to look into incidents if the provenance cannot be proven.

The Three Promises Of Embedding Store Platform

Reliability guarantees (SLAs/SLOs)
Safety guarantees (governance, access control, auditability)
Economics guarantees (quotas, cost attribution, guardrails)

Let's map each to OpenSearch design choices and AWS primitives.

Service Level Agreements (SLAs)

The majority of teams solely track OpenSearch uptime, which is not enough.

There are two critical paths for embeddings:

Ingest path: S3 doc changes -> chunk -> embed (Bedrock) -> index (OpenSearch) -> retrievable
Query path: user query -> embed -> retrieve (OpenSearch) -> (optional rerank) -> return context

Service level objectives (SLOs) for your platform should cover three categories:

1. Tail Behavior and Query Latency

P95 and P99 latency for requests involving vector searches
P99 latency at periods of high traffic (typically the problem)

The effectiveness of OpenSearch vector searches frequently depends on index configuration, shared strategy, dimensions of a vector, filters, top-K, and approximate parameters for the search. These are standardized via a platform approach, preventing each team from reinventing risky settings.

2. Freshness (The Real Business SLA)

Freshness is the real SLA if your RAG promises to reflect new docs quickly. Describe a freshness SLO as follows:

"Within 15 minutes, 95% of modified documents can be retrieved."

To measure it, keep track of the arrival timing of the S3 event, the time chunking begin time, the embedding finish time, the indexing finish time, and the first successful searchable time.

3. A Minimal Quality SLO (Yup, You Require One)

A perfect research-grade evaluation is not necessary. You require a steady signal.

For instance:

Recall@K on a small set, such as 100–300 carefully chosen query to doc pairings
Empty retrieval rate (percentage of queries that return irrelevant or low-similarity chunks)
Top-1 stability across releases (the frequency of unexpected changes in the top result)

Preventing 'we shipped a change and retrieval got worse' from 'being discovered in production' is your goal.

Governance: Provenance, Policy Enforcement, And Auditability

The most common governance error with S3 and OpenSearch is:

'We'll implement access control within the application.'

Under conditions of scale, different teams, and changing regulations, it often fails. A platform should enforce governance at two points:

Before indexing: What is permitted to access the embedding store
At retrieval time: Who can retrieve what?

1. Provenance: A Birth Certificate Is Required for Each Vector

Store metadata like the below for every chunk that OpenSearch has indexed.

s3_uri (or key/bucket)
doc_id (stable ID)
doc_version (content hash, last_modified, and ETag)
chunk_id, chunk_start, and chunk_end
embedding_model_id (Bedrock model identifier)
embedding_version (the label for your internal version)
chunking_version (the version label for your chunking configuration)
preprocess_flags (html_stripped, boilerplate_removed, pii_redacted)
domain, tenant_id
policy_tags (confidential, export-controlled, internal, etc.)
pipeline_run_id, created_at

This metadata is not 'nice to have.' It makes it possible for selective rebuilds (just re-embed the impacted documents), incident response, describing the reasons for the shift in outcomes, and retention enforcement (TTL - time to live and deletion correctness).

2. Access Control: Apply Rules at the Time of Retrieval Rather Than Later

In terms of AWS, there are often two layers:

Access to S3 (source truth)
- IAM, KMS encryption, bucket policies, and ideally ABAC (attribute-based access control) tags
- guarantees the protection of raw documents
Access to OpenSearch retrieval (serving layer)

You want to make sure that a query cannot return chunks that are not authorized for the user.

Practical strategies for OpenSearch:

Tenant's hard partition
- separate indices for each tenant, or at the very least,
- separate index patterns
- easiest and most secure for strict isolation
Enforced metadata filters in a soft multi-tenant system
- single index, but tenant_id, clearance, etc., are present in each document
- Mandatory filters are injected server-side by your platform query service
- Don't rely on filters provided by clients
Fine-grained access control (where possible)
- enforce role-based access and index permissions
- If at all possible, refrain from granting downstream apps with raw OpenSearch access

A solid rule: If a user can never see a document, don’t index it into a shared space where it could be retrieved accidentally.

3. Auditability: Treat Retrieval as a Regulated Action

Your audit record should capture:

Principal identity (tenant, role, user/service)
Query ID and timestamp
Model/version of embedding used
OpenSearch index and search parameters
Required filters applied
Returned top-K chunk IDs and similarity ratings
Associated S3 URIs and document IDs
Downstream 'answer ID' is optional if end-to-end traceability is desired

With this, 'RAG said something weird' can be debugged.

Quotas: Keeping Embedding Costs And Shared Infra Sane

If you don't enforce quotas, the embedding store becomes a shared credit card.

The biggest cost drivers in this stack are Bedrock embedding calls (or compute if self-hosted), OpenSearch storage growth and index maintenance, heavy top-K + filtering + high QPS retrieval, and optional reranking. Both the intake and query pathways should have platform quotas.

1. Ingest Quotas (Stop Storms From Re-Embedding)

For instance:

Maximum documents per tenant per day
Maximum tokens per tenant per day
Maximum embeddings per tenant per day
Maximum frequency of re-embedding (e.g., once daily unless authorized)
Maximum number of chunks per document (controls runaway chunking)

2. Query Quotas (Manage Serving Expenses)

For instance:

Maximum QPS for each tenant
Maximum K (top-K) for each request
Maximum number of filtered facets
Graceful backpressure and burst limits

3. Cost Attribution (Showback)

You should be able to report even if you never "charge" teams:

Embeddings generated daily for each tenant
Rise in storage per tenant
Queries per tenant plus the average top-K
Estimated monthly rent for each tenant

Teams behave more rationally when they are able to see their footprint.

Reference Architecture: Embedding Store Platform On AWS

The simplest way to stop 'everyone talks to OpenSearch directly' is to introduce a platform boundary. The core components are:

1. Document Intake

S3 is the record system.
Changes are detected by S3 events (or scheduled scans).
Text is extracted, chunked, and metadata is prepared using a pipeline.

2. Pipeline for Embedding and Indexing

Chunk -> Bedrock embeddings -> OpenSearch indexing.
Store provenance information alongside vectors.
For traceability, write pipeline run logs.

3. Query service

Accepts user query.
Calls the query vector's Bedrock embeddings.
Injects mandatory policy filters.
OpenSearch queries.
Returns chunks, citations, and metadata.

4. Platform Controls

Authentication (Cognito, IAM, etc.).
Authorization and entitlements (tenant mapping, ABAC tags).
Rate caps and quotas (per tenant).
Observability and audit records.
Jobs for quality assessment and release gates.

A helpful way to put it: 'OpenSearch access' should not be granted to teams. They should get Embedding Platform access.

How This Appears In OpenSearch: Indexing Strategy

An example of a typical chunk document stored in OpenSearch looks like

vector: the embedding
text: the chunk text (often saved elsewhere, occasionally optional)
chunk_id, doc_id
doc_version, s3_uri
department, tenant_id, policy_tags, and region
created_at, chunking_version, embedding_version

The crucial patterns of design:

Add filters to metadata fields so that OpenSearch can effectively filter them.
To implement platform-wide policies, maintain a uniform schema across domains.

Failure Modes This Platform Prevents

1. 'Search Got Worse After We Upgraded Embeddings'

You discover this in production if versioning and evaluation gates are not used.

Using this platform:

A canary index receives the updated embedding version
Conduct golden set testing
Compare stability and Recall@K
promote only if it succeeds

2. 'Everything Was Re-embedded Due to a Pipeline Bug'

Your budget and throughput are melted in the absence of quotas.

Using this platform:

Throttle token/day ingest quotas
An explicit admin workflow is necessary for massive rebuilds.
Tenants can be paused without pausing everyone.

3. 'We Leaked Information Among Teams'

It's just one missing parameter away in the absence of enforced filters.

Using this platform:

The query service injects policy filters on the server side.
Tenant isolation is structural (separate namespaces or indices).
Audit logs reveal exactly what took place.

4. 'Storage Never Stops Expanding'

Using this platform:

There are TTL policies and lifespan guidelines.
Stale documents are found and removed.
You can set limits and report storage growth for each tenant.

Conclusion

OpenSearch has the potential to be a great vector engine. Bedrock is capable of generating embeddings reliably. Your truth can be stored on S3. However, those elements by themselves will not determine the effectiveness of your RAG system. It will be decided by whether you built the missing layer: SLAs for freshness and reliability, governance to enable forensics and prevent leaks, quotas to maintain the system's sustainability.

You can do more than just "RAG that works" once you approach the embedding store as a platform. You get a shared foundation where multiple teams can build safely without turning embeddings into your next ungoverned data swamp.

AWS AI

Opinions expressed by DZone contributors are their own.

Related

Trending

Embedding Store as a Platform on AWS: OpenSearch + Bedrock + S3 Needs SLAs, Governance, and Quotas

Vector search is not "just OpenSearch." It just needs to be run as a platform with SLAs, governance, and quotas to control drift, leaks, and out-of-control costs.

Why Vector Data Needs 'Platform Thinking'

1. Embeddings Are Model Artifacts and Not Facts

2. Quality Can Deteriorate Silently

3. Multiple Tenants Just Happen

4. Security Is a Must

The Three Promises Of Embedding Store Platform

Service Level Agreements (SLAs)

1. Tail Behavior and Query Latency

2. Freshness (The Real Business SLA)

3. A Minimal Quality SLO (Yup, You Require One)

Governance: Provenance, Policy Enforcement, And Auditability

1. Provenance: A Birth Certificate Is Required for Each Vector

2. Access Control: Apply Rules at the Time of Retrieval Rather Than Later

3. Auditability: Treat Retrieval as a Regulated Action

Quotas: Keeping Embedding Costs And Shared Infra Sane

1. Ingest Quotas (Stop Storms From Re-Embedding)

2. Query Quotas (Manage Serving Expenses)

3. Cost Attribution (Showback)

Reference Architecture: Embedding Store Platform On AWS

1. Document Intake

2. Pipeline for Embedding and Indexing

3. Query service

4. Platform Controls

How This Appears In OpenSearch: Indexing Strategy

Failure Modes This Platform Prevents

1. 'Search Got Worse After We Upgraded Embeddings'

2. 'Everything Was Re-embedded Due to a Pipeline Bug'

3. 'We Leaked Information Among Teams'

4. 'Storage Never Stops Expanding'

Conclusion

Related

Partner Resources