How to Build Permission-Aware Retrieval That Doesn't Leak Across Teams
Permission-aware retrieval ensures that the assistant uses only allowed information. A context graph enforces access control to prevent cross-team leakage.
Join the DZone community and get the full member experience.
Join For FreeLLM assistants or chatbots are very good at connecting the dots, which is exactly why they can be dangerous in multi-team organizations. A PM from team A asks, 'Why did the churn rate spike last Wednesday?' The assistant retrieves and displays an answer written by Team B, which includes customer names and contact details. Even if you block the final answer, the leak may have already occurred through retried snippets, intermediate summaries, cached results, etc.
If your retrieval layer isn't permission-aware end-to-end, the model can pull context from other teams' documents, tickets, dashboards, or embeddings. This is not just about blocking access. In reality, leaks happen during retrieval, summarization, inside tool traces/logs, or via shared embedding stores.
This article provides a practical blueprint for building permission-aware retrieval using a context graph and either role-based access control (RBAC) or attribute-based access control (ABAC): identity -> entitlements -> retrieval -> citation, along with the failure modes that affect teams. If you implement these four pieces of the puzzle correctly, you can support natural language analytics and document search across teams without cross-team leakage.
The Problem
Most RAG systems start with chunking every document into text pieces, embedding them, vector searching top-k, stuffing them into a prompt, and finally generating an answer. This works for public documents but breaks for internal multi-team environments. The core issue is that permissions live on resources and relationships, not on chunks of text, and RAG systems usually don't model those relationships explicitly.
- Access control is attached to resources such as documents, dashboards, datasets, etc., and not raw chunks of text.
- Context is relational, such as tickets referencing runbooks; dashboards depend on datasets, and datasets derive from tables.
- Even if the final answer is filtered, leaks happen upstream during retrieval, reranking, summarization, caching, or logging.
You need a retrieval that understands who the user is, what they are allowed to see, where the content came from, and how it was used in the final response. That's what a context graph provides you.
What Is a Context Graph?
A graph of objects your assistant uses as context, along with relationships and policies. A context graph model treats context as governed objects and relationships, not just text. That matters because permissions and provenance are properties of the source object and the path you took to reach it, not just the retrieved paragraph.
Let's see what this graph contains in nodes and edges.
Typical node types (objects):
- Document, Page, Runbook
- Dataset, Table, Column, Metric
- Dashboard, Chart
- Service, Incident, Ticket
- Team, Project
- EmbeddingChunk (a derived artifact, not a source of truth)
Typical edges (relationships):
- owns, belongs_to, derived_from
- references, depends_on
- has_policy, has_tag, contains
You might think, why can't we use a big index instead of a graph. It is because permissions and provenance aren't just properties of chunks. They are properties of objects and relationships.
A context graph lets you enforce rules like
- The user can see dashboard metadata, but not the underlying dataset
- The user can see the document, but not a restricted section that is tagged as PII
Architecture
1. Identity: Know Who Is Asking (and Don't Let the LLM Guess)
Permission-aware retrieval starts with a trustworthy identity. Identity should come from your auth layer (SSO/OIDC/SAML) and not from prompts/LLM. Don't let the model decide it is an admin because the user said so.
For each request, you want a stable identity object such as user_id, groups (RBAC roles), memberships, project access, environment (prod/dev), and optional as risk/context (device/network)
The rule here is that the LLM should never decide who the user is or what they can access. It should only receive the results of that decision.
2. Entitlements: Compute What They Are Allowed to See
Permission-aware retrieval fails when permissions are scattered across systems with inconsistent logic. So, you need to build or designate a single decision point that can answer authorization checks for any object in your context graph.
Create an entitlement service (or reuse one) that can answer:
- Can user X view resource R?
- What resources can user U access in project P?
- If R is a dataset, which rows/columns are allowed?
This can combine RBAC, ABAC, or data policies.
- RBAC: roles or groups
- ABAC: attributes or tags (e.g., team=marketing, sensitivity=confidential)
- Data policies: row-level security and column-level security
In the graph, permissions should be attached to nodes and edges. Edges matter because relationship traversal can leak context (e.g., a public ticket referring to a private postmortem)
3. Retrieval: Filter Before Ranking, Reranking, or Summarization
A safe retrieval pipeline can be built by candidate generation, such as vector search/keyword/graph traversal, permission filtering (hard gate), ranking and re-ranking (only allowed items), and context assembly, such as snippets or summaries bounded to allowed sources.
The key principle here is to never let LLM see forbidden candidates, not even titles, metadata, or almost allowed snippets. It is because the second the model sees something it shouldn't, you have already leaked, and maybe not in the answer, but in logs, traces, or hidden chain outputs.
You can implement retrieval with a context graph in simple steps.
Step A: Build an Allowed Scope First Using the Graph
- Identify which teams/projects/spaces the user can access
- Identify allowed source systems such as Jira tickets, Confluence pages, etc.
- Identify data access constraints, such as RLS/CLS for tables
Step B: Search Inside the Boundary
- Docs: vector search with server-side filter
doc_id in allowed_doc_idsorsecurity domain_id = X - Tickets: keyword search within allowed Jira projects
- Data: semantic layer SQL generation and enforces RLS/CLS
- Runbooks: graph traversal from service nodes that the user can access
If your vector databases can't enforce filters reliably, choose one of the following:
- Per-tenant/per-team indexes such as strong isolation, higher op costs, etc.
- Security domain partitioning, such as an index per organization or per department
- Refuse unfiltered queries at the retrieval API (this is a hard fail if there is no scope)
The non-negotiable part is that the retrieval API should reject any request that doesn't include a validated scope.
4. Citation and Audit: Prove What You Used and Prove It Was Allowed
To be faithful or trustworthy, the assistant needs to show sources it used, such as document IDs, dashboard IDs, or datasets, versions/snapshots, why the system believes access was allowed, and how these sources map to the answer.
An answer bundle is a useful internal structure, such as request_id, user_id, retrieval_items[] such as resource_id, type, version, snippet_hash, policy_checks[], answer_text, citations
This makes the answers auditable and makes incidents debuggable when something goes wrong.
Failure Modes That Still Leak Data (How to Prevent Them)
1. Indirect Leakage via Summaries
Even if you filter forbidden docs, a common bug is that the model summarizes a mixed context before filtering, or the system caches summaries globally and reuses them across users.
We can fix this by keeping a permission gate before any LLM call and never summarizing the forbidden content in the first place. Avoid caching generated summaries entirely or cache summaries scoped to the same entitlement boundary, such as team+role.
2. Tool Trace Leakage (Logs, Spans, Reasoning Fields)
Agentic systems leak through logs that captured tool outputs, traces shipped to vendors, debug views that show raw context, and tool call arguments such as SQL queries with restricted fields.
We can fix this by separating the secure trace store from general app logs, applying the same RBAC to observability tools, implementing structured redaction at the boundary, and using log referencing instead of raw payloads.
3. Cross-Tenant Embeddings (A Silent Killer)
If embeddings are shared across teams or tenants, the nearest neighbor search can surface semantically similar chunks from restricted domains, especially if filters are missing or buggy.
We can fix this by storing embeddings with a security_domain_id and refusing unfiltered queries at an API level, enforcing mandatory filters server-side and testing them like a security control, preferring per-tenant indexes for strict isolation, and also considering encrypting embeddings per tenant if your threat model requires it.
Minimal Implementation
Start with a context graph data model where every object you index, such as documents, tickets, dashboards, or datasets, has a stable resource ID, and every node carries core governance metadata like security_domain_id (tenant/team boundary), owner_team and sensitivity tags such as PII, confidential, etc., applied not only to nodes but also to edges so that relationships can't leak. Record versioning or snapshot timestamps so that answers can be traced to an exact state of the world.
Build an entitlements service that answers can_access(user,resource,action) with an explicit allow/deny and a reason code, and make it a batch API because each query will validate many candidates. Those reason codes become the backbone of auditing and debugging.
Your retrieval service should require a validated access scope that is derived from identity and entitlements, and enforce a hard permission gate before any LLM call, and then support filtered vector search, keyword search constrained to allowed projects/spaces, and optional graph traversal where each hop is checked at both edge and node level.
Finally, produce the responses with citations mapped to resource IDs, which means only showing titles/links when the user is allowed to view them, and store a compact audit bundle of retrieved items, and policy decisions, and citation mapping that is keyed by request_id so every answer is explainable, reproducible, and forensically debuggable.
Treat permission filters like security controls, add regression tests, and canary secrets per team/tenant to prove cross-team retrieval never happens.
Conclusion
Permission-aware retrieval isn't a single guardrail; it is an end-to-end property of your system. If any stage ignores entitlements, the model will eventually leak.
A permission-aware retrieval graph gives you a clean separation of concerns, such as deterministic identity and entitlements, bounded retrieval, citations making answers auditable, and debuggable failures. That's how you move from an assistant who seems safe in a demo to a multi-team assistant that stays safe under pressure.
Opinions expressed by DZone contributors are their own.
Comments