AI Search Access Control: Why RAG Without Permissions Is Dangerous

A Company Brain must not retrieve everything just because the data is technically available. In RAG systems, permission checks must happen before or during retrieval, otherwise an AI assistant can expose confidential information from HR, leadership, pricing, or customer projects in the wrong context. AI search access control is therefore not an add-on. It is the foundation for trust, privacy, and secure adoption.

Why is RAG dangerous without an access control concept?

Many companies think about AI search mainly in terms of answer quality. Does the assistant find the right file? Does it understand the question? Does it summarize well? These questions matter, but they are not the first security question.

The first security question is: Is this user allowed to see this information?

A RAG system retrieves relevant content from documents, databases, tickets, emails, or knowledge sources before generating an answer. These retrieved items are passed to the language model as context. If the retrieval step finds content the user is not authorized to access, the security boundary has already been crossed. The model may include confidential information in a normal-looking answer.

The risk is higher than with traditional search because AI search does not only return a list of results. It creates a synthesized answer. Sources may come from several documents. Sensitive details may be combined with harmless information. That is why access control in RAG is not optional.

Oso describes the problem directly: RAG pipelines are excellent at finding relevant information, but poor at respecting permissions unless authorization is deliberately built into the architecture. Source: https://www.osohq.com/post/right-approach-to-authorization-in-rag

Which data is especially sensitive in a Company Brain?

A useful Company Brain rarely contains only harmless manuals. Once it becomes valuable, it contains business-critical information.

This may include HR data, salary information, hiring notes, internal evaluations, leadership documents, pricing logic, margins, customer-specific conditions, contract details, escalations, project issues, technical access information, security documents, internal strategies, and confidential customer communication.

Even seemingly harmless information can become sensitive when combined. A single project status may not be critical. Combined with pricing, customer name, internal assessment, and escalation notes, it can become confidential knowledge.

This is the special risk of AI systems. They do not only find. They combine.

Why are SharePoint or file system permissions not enough by themselves?

Existing permissions in SharePoint, Google Drive, Confluence, Jira, or file servers are important. But they do not automatically solve the problem once data is copied into a RAG pipeline.

During indexing, documents are split into chunks, converted into embeddings, and stored in a search or vector database. If the original permissions are not transferred correctly, a second knowledge layer appears without the same protection.

Common risks include:

A user had access in the past but no longer does.
A document was indexed before permissions changed.
A chunk contains sensitive content, but permissions were stored only at document level.
Metadata is missing.
Embeddings or indexes contain content that should no longer be visible.
An agent calls data through an API without carrying the user’s permissions.

A RAG system must therefore not only know permissions. It must enforce them.

Where should access control happen in RAG?

Access control can apply at several points. The earlier it happens, the safer the system becomes.

Control pointWhat happens?Risk if missing
IngestionOnly allowed sources are indexedRestricted content enters the index
ChunkingPermissions are attached to knowledge unitsText chunks lose protection context
IndexingRoles, rights, and metadata are storedSearch finds content without access context
RetrievalResults are filtered before the model sees themThe model receives confidential content
GenerationSources and uncertainty are checkedSensitive content appears in answers
LoggingAccess and answers are traceableSecurity incidents remain invisible
UpdatesPermission changes are synchronizedOld access rules remain active in the index

The core rule is simple: the assistant must never receive context the user is not allowed to see. It is not enough for the final answer to be safe. Retrieval itself must be safe.

Why is post-generation filtering too weak?

Some systems try to remove sensitive content only after an answer has been generated. That is risky. If the model already received confidential content, the boundary was crossed. Even if the final answer is shortened, sensitive information may still influence the response indirectly.

Example: An employee asks, “Why was customer A deprioritized last quarter?” The system retrieves confidential leadership notes, internal margin analysis, and a critical customer assessment. If these sources are filtered only after generation, the answer may still contain indirect signals.

A safer approach prevents unauthorized content from entering the model context at all. That means permission checks before or during retrieval.

AWS states that comparable authorization controls can be implemented for vector database results before providing that context to large language models. Source: https://aws.amazon.com/blogs/security/authorizing-access-to-data-with-rag-implementations/

Which numbers show why this matters?

IBM puts the 2025 global average cost of a data breach at 4.4 million US dollars. The report also points to a growing gap between AI adoption and AI governance. For Company Brain projects, this means security architecture is not a theoretical issue. It is a business risk. Source: https://www.ibm.com/reports/data-breach

IBM also reports that ungoverned AI is associated with higher security risk and can increase breach costs. An internal RAG system without proper access control fits this risk pattern when it indexes content broadly but does not authorize retrieval carefully. Source: https://www.ibm.com/think/x-force/2025-cost-of-a-data-breach-navigating-ai

The Verizon 2025 Data Breach Investigations Report analyzed 22,052 security incidents and 12,195 confirmed data breaches. This shows the scale of real security issues where access, permissions, misdelivery, misuse of legitimate privileges, and internal mistakes can play a role. Source: https://www.verizon.com/business/resources/reports/2025-dbir-executive-summary.pdf

OWASP lists sensitive information disclosure as a major LLM application risk and recommends restricting data sources and securing runtime orchestration to prevent unintended data leakage. Source: https://genai.owasp.org/llmrisk/llm02-insecure-output-handling/

Why is this especially relevant for privacy regulation?

Privacy regulation is not only about storing data securely. It is also about who may see or process personal data, for which purpose, and under which controls. A RAG system that exposes personal data from HR, customer projects, or internal evaluations in the wrong context can quickly become problematic.

This becomes especially sensitive when AI combines information from different sources. A single document may be permissioned correctly. The synthesized answer may reveal more than the user should know.

For a Company Brain, this means roles, purposes, data categories, permissions, and logging must be considered from the start.

“We have the data internally” is not a sufficient argument. Internal does not mean authorized.

Which permission models are relevant?

There are several common models.

Role-Based Access Control, or RBAC, assigns permissions through roles such as leadership, sales, service, HR, or project management. It is easy to understand and useful for a start, but it can be too broad for exceptions.

Attribute-Based Access Control, or ABAC, uses attributes such as department, location, customer, project, confidentiality level, or employment status. It is more flexible, but it requires clean metadata.

Relationship-Based Access Control, or ReBAC, checks relationships. For example: the user is a member of this project team, manages this customer, or owns this case. For modern Company Brain systems, this is often very relevant because company knowledge is tied to customers, projects, and responsibilities.

In practice, companies often need a combination. A service employee may see general process information but not pricing calculations. A project manager may see project knowledge but not HR notes. A sales employee may see customer information but not internal margin assessments.

What does access control at chunk level mean?

RAG systems rarely work with whole documents. They split documents into smaller chunks. That is why document-level permissions are not always enough.

A document can contain both general and confidential sections. When the system creates chunks, permissions must travel with them. Otherwise, a harmless document title may lead to the retrieval of a sensitive section.

Example: A project closeout report contains a general summary, technical lessons learned, internal error analysis, customer assessment, and margin calculation. Not every user may see everything. If all chunks inherit the same permission, the system becomes either too open or too restrictive.

A good Company Brain needs metadata and permissions at the knowledge-unit level, not only at the file-storage level.

Why are embeddings and vector databases security-relevant?

Embeddings are mathematical representations of text. They are not plain readable text, but they are derived from content and enable retrieval. If sensitive content is embedded and stored in a vector database without review, the security issue is not solved.

The vector database must either understand permissions or retrieval results must be authorized before they are passed to the model. Otherwise, semantic search may find relevant but unauthorized content.

Deletion and permission changes also matter. If a document is deleted or access is revoked, indexes, embeddings, and caches must be updated accordingly. Otherwise, knowledge may survive in AI search even after it is no longer accessible in the source system.

Why do logs and traceability matter?

A Company Brain must not only answer correctly. It must also make answers traceable.

Who asked the question? Which sources were retrieved? Which results were sent to the model? Which answer was generated? Was a source excluded because of missing permission? Was there an escalation? Was a sensitive category touched?

Without logs, errors cannot be investigated. Without logs, it is also difficult to prove that permissions were enforced. For trust in mid sized companies, this matters. Leaders want to know that an AI assistant is not silently combining data from areas that should remain separate.

Logging must also be designed carefully. Logs themselves can become sensitive. They need purpose limitation, access controls, retention rules, and technical protection.

What mistakes do companies make with permissions?

The most common mistake is overtrusting existing folder permissions. Companies assume access is already solved because SharePoint or a file server has permissions. After indexing, that is only true if those permissions are transferred into the RAG pipeline and continuously updated.

The second mistake is an overly broad role model. “All employees” is rarely a good security category in a Company Brain.

The third mistake is missing updates. Employees change roles, leave projects, receive temporary access, or lose responsibilities. If AI search does not reflect this quickly, shadow access appears.

The fourth mistake is separating relevance from permission. A retrieved result must not be used simply because it is relevant. It must be relevant and allowed.

How should a secure Company Brain be designed?

A secure Company Brain starts with data classification. Which content is public, internal, confidential, strictly confidential, or personal? Then the company needs roles, attributes, and relationships. Who may see what, for what reason, and in which context?

Then comes the technical implementation. Permissions must be transferred during indexing, attached to chunks, checked during retrieval, and reflected in answers. Sources should be shown where the user is authorized to see them. Uncertainty should trigger clarification or escalation, not a guessed answer.

Regular audits are also necessary. Permissions are not a one-time project. They change with people, customers, projects, and organizational structure.

Why is AI search without access control dangerous?

AI search without access control is dangerous because it does not only find information. It combines and explains information. This can expose confidential content even when the user never directly asked for a secret document.

A secure Company Brain must therefore follow the same basic rule as any professional IT system: access only when permission, purpose, and context fit.

Only then does RAG become a trustworthy enterprise tool.

Further reading

OWASP – Top 10 for Large Language Model Applications
https://owasp.org/www-project-top-10-for-large-language-model-applications/

AWS Security Blog – Authorizing access to data with RAG implementations
https://aws.amazon.com/blogs/security/authorizing-access-to-data-with-rag-implementations/

Auth0 – Build Trustworthy AI: Implementing Access Control for RAG Systems Using FGA
https://auth0.com/blog/rag-and-access-control-where-do-you-start/

Sources for the statistics used

IBM – Cost of a Data Breach Report 2025
https://www.ibm.com/reports/data-breach

IBM X-Force – 2025 Cost of a Data Breach Report: Navigating the AI rush without taking on security debt
https://www.ibm.com/think/x-force/2025-cost-of-a-data-breach-navigating-ai

Verizon – 2025 Data Breach Investigations Report Executive Summary
https://www.verizon.com/business/resources/reports/2025-dbir-executive-summary.pdf

OWASP – LLM02:2025 Sensitive Information Disclosure
https://genai.owasp.org/llmrisk/llm02-insecure-output-handling/

FAQ

What does access control mean in RAG?

Access control in RAG means that an AI system may only retrieve and use content the current user is authorized to access. The check must happen before or during retrieval. It is not enough to remove sensitive content after the answer is generated, because it has already entered the model context.

Why is AI search risky without permissions?

AI search can combine content from many sources and present it as one clear answer. If HR data, pricing, leadership documents, or confidential customer information are included, a data leak can occur. The risk is higher than in traditional search because sensitive details may appear indirectly through synthesized answers.

Are SharePoint permissions enough for a Company Brain?

SharePoint permissions are important, but not automatically enough. When documents are indexed, chunked, and stored in a vector database, the original permissions must be transferred and kept up to date. Otherwise, a second knowledge layer appears with incomplete or different access controls.

What is the difference between RBAC, ABAC, and ReBAC?

RBAC assigns permissions through roles such as sales, service, or HR. ABAC uses attributes such as location, customer, project, or confidentiality level. ReBAC checks relationships, such as whether a user belongs to a project team. A Company Brain often needs a combination because knowledge is tied to roles, customers, and projects.

Why do permissions matter at chunk level?

RAG systems usually work with text chunks, not whole documents. A single document may contain both general and confidential sections. If all chunks receive the same permission, the system becomes either too open or too restrictive. Permissions and metadata should therefore travel with the knowledge unit itself.

What happens when permissions change after indexing?

When permissions change, AI search must reflect that change. Otherwise, a user may still retrieve content from the index even though they lost access in the source system. A Company Brain needs synchronization, re-indexing, permission checks during retrieval, and clear rules for caches and stored context.

How does privacy regulation affect RAG systems?

Privacy regulation matters whenever personal data is processed. A RAG system must not freely retrieve, combine, or display personal data. It needs purpose limitation, access controls, data minimization, logging, and deletion concepts. Data stored internally is not automatically authorized for every internal user.

How should a secure permission concept start?

A secure permission concept should begin with data classification. Then roles, attributes, relationships, and sensitive data categories are defined. The company decides which content may be indexed, which permissions apply to chunks, how retrieval is filtered, and how access is logged and audited.