The Silent Data Leak in Multi-Tenant AI
There is a class of bug that scares me more than any SQL injection or RCE. It is the kind that passes every health check, every smoke test, every monitoring alert — and quietly cross-pollinates private data between tenants through the LLM context window.
In a traditional web app, a missing tenant filter on a database query returns extra rows. You might notice it in the UI. In an AI system, a missing tenant filter on a vector search silently injects someone else's proprietary documents into the retrieval-augmented generation pipeline. The LLM synthesizes it, attributes nothing, and the user reads a response that contains fragments of another organization's private knowledge — with full confidence and zero indication that anything went wrong.
No error. No 500. No audit trail. Just silent information leakage through the most trusted component in the system.
We spent the last week auditing every data-bearing subsystem in AitherOS for exactly this class of vulnerability. Here is what we found and how we fixed it.
The Architecture We Started With
AitherOS already had strong multi-tenant infrastructure. This was not starting from zero:
- PostgreSQL: Database-per-tenant (
aither_{slug}) - Conversations: Filesystem-partitioned (
data/conversations/{tenant_slug}/) - Spirit (personality memory): Per-tenant
SpiritEngineinstances - Lockbox: Fernet-encrypted vault with hardware-ID-derived keys
- Secrets: Directory-scoped (
secrets_db/tenants/{tenant_id}/) - Flux (event bus): Channel prefixing with mailbox scope enforcement
- TenantContext:
ContextVarpropagation, cleared infinallyblocks, fail-closed
Every request carries a TenantContext through the async call chain via Python's contextvars. The TenantMiddleware extracts it from JWT claims or headers, and every downstream service can call get_current_tenant() to get the active tenant. The system is zero-dependency at the core — no network calls needed to check tenant identity.
Sounds solid, right? It is, at the physical storage layer. The gaps were in the logical layer — the places where tenant scoping exists as a "metadata hint" rather than an enforced filter.
The 11 Gaps We Found
We systematically audited every subsystem and classified gaps by blast radius.
P0: Silent Cross-Tenant Data Leaks
GAP-1: CognitiveTracer had NO tenant scoping. Sessions were stored as _sessions[session_id]. If two tenants generated the same session ID, their reasoning traces — which contain tool call results, intermediate LLM outputs, and chain-of-thought data — would collide silently. One tenant could retrieve another tenant's cognitive traces.
GAP-2: ScopedMemory relied on metadata hints, not enforced filtering. The MemoryHub received scope_key as a metadata "hint" in the query payload. If MemoryHub ignored or dropped the hint (which any backend change could cause), vector search would return nearest neighbors from ALL tenants. This is the exact scenario people worry about: your proprietary documents showing up in another tenant's RAG context.
GAP-3: OpenTelemetry spans lacked tenant_id attributes. Every trace exported to Jaeger or Grafana Tempo had no tenant attribution. Cross-tenant trace queries were possible from shared observability infrastructure.
P1: Logical Isolation Gaps
GAP-4: ContextVar defaulted to PLATFORM_CONTEXT. When no tenant was set, get_current_tenant() returned PLATFORM_CONTEXT — full operator-level access. Correct for backwards compatibility with internal services, but any new codepath that forgot to set tenant context silently got god-mode.
GAP-5: Redis used key-prefixing, not physical isolation. Single Redis instance with ctx:{tenant}:{workspace}:{hash} namespacing. A KEYS * or SCAN from any connection could enumerate all tenants' cached context.
GAP-6: Chronicle log tailing had no tenant enforcement. The /tail/{service} and group log endpoints returned raw logs without tenant filtering. A non-platform caller could read operational details from other tenants.
GAP-7: KnowledgeGraph tenant_id was Optional. list_nodes() and search() accepted tenant_id: Optional[str]. When None was passed from a tenant-scoped request, the caller got platform-level visibility over all knowledge graph nodes.
P2: Physical Isolation Gaps
GAP-8 through GAP-11 covered per-tenant encryption keys (shared master key), dedicated infrastructure provisioning, network isolation, and per-tenant backups. Strategic improvements rather than urgent fixes.
The Fixes
Tenant-Scoped Trace Sessions (GAP-1)
We changed the session storage key from session_id to {tenant_id}:{session_id}:
def _tenant_session_key(self, session_id, tenant_id=None):
if tenant_id is None:
from lib.core.AitherTenant import get_current_tenant
tenant_id = get_current_tenant().tenant_id
return f"{tenant_id}:{session_id}"
Every session method — start_session(), get_session(), end_session(), get_recent_sessions() — now routes through this composite key. get_recent_sessions() filters by tenant prefix; platform sees all, non-platform sees only their own.
Defense-in-Depth Memory Filtering (GAP-2)
Two layers of protection:
Layer 1 — Mandatory tenant_id filter in the MemoryHub query. Not a metadata hint, a required parameter:
resp = await client.post(f"{self._hub_url}/query", json={
"query": query,
"tenant_id": tenant_id, # Mandatory, not optional
"metadata": {"memory_scope": scope_key, "scope_tenant": tenant_id},
})
Layer 2 — Post-retrieval assertion. Even if the upstream backend returns cross-tenant results, we filter them out and log a WARNING:
pre_count = len(hub_results)
hub_results = [
r for r in hub_results
if r.get("metadata", {}).get("scope_tenant", tenant_id) == tenant_id
]
if len(hub_results) < pre_count:
log.warning(
"[TENANT-ISOLATION] Filtered %d cross-tenant results from MemoryHub",
pre_count - len(hub_results),
)
This means even if a MemoryHub backend bug drops the tenant filter, cross-tenant data never reaches the LLM context. And the WARNING creates an audit trail so we catch the upstream bug.
Redis DB-Index Isolation (GAP-5)
Instead of just key-prefixing on a shared Redis DB, each tenant maps to a dedicated Redis DB index:
def _tenant_db_index(tenant_id: str) -> int:
if not tenant_id or tenant_id == "platform":
return REDIS_CACHE_DB
return (hash(tenant_id) % 14) + 2 # DB 2-15, reserving 0-1 for system
Every store/get method now routes through _get_tenant_redis(tenant_id) which maintains per-tenant connections to separate DB indices. A SCAN on one tenant's DB index physically cannot see another tenant's keys.
For PARANOID-level tenants (healthcare, finance, legal), TenantInfraProvisioner can spin up entirely dedicated Redis containers.
Per-Tenant Encryption Keys (GAP-8)
Previously, a single master key (derived from AITHER_MASTER_KEY + hardware ID) encrypted all tenants' Lockbox and Secrets data. We added HKDF-based key derivation:
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.hazmat.primitives import hashes
def derive_key(master_key: bytes, tenant_id: str) -> bytes:
hkdf = HKDF(
algorithm=hashes.SHA256(),
length=32,
salt=_get_salt(),
info=f"aitheros-tenant-{tenant_id}".encode("utf-8"),
)
return hkdf.derive(master_key)
Each tenant gets a cryptographically independent key. Compromise of one tenant's derived key does not expose others. The Lockbox's derive_scope_fernet() now routes through TenantKeyDerivation for tenant-scoped operations, and AitherSecrets' envelope encryption inherits this automatically.
Fail-Closed Defaults (GAP-4, GAP-7)
For the PLATFORM_CONTEXT fallback, we added AITHER_REQUIRE_EXPLICIT_TENANT mode. When enabled in SaaS deployments, any codepath that falls back to platform access logs a WARNING with full stack trace — making it trivial to find and fix missing tenant propagation.
For KnowledgeGraph, both /nodes and /search now resolve tenant from the ContextVar when tenant_id is None. Non-platform callers are automatically restricted to their own nodes — no more "optional" parameter leading to accidental platform-level queries.
Chronicle Enforcement (GAP-6)
Raw log tailing (/tail/{service}, /tail) is now restricted to platform operators only — non-platform callers get a 403. Group log endpoints (/logs/group/{group}, /search/group/{group}) now apply the same auto-scoping pattern used by the primary /logs and /search endpoints: resolve tenant from ContextVar, non-platform callers can only see their own logs.
Infrastructure Provisioning (GAP-9, GAP-10, GAP-11)
Three new modules:
TenantInfraProvisioner — Manages dedicated infrastructure per tenant with three isolation levels (STANDARD, STRICT, PARANOID). Can provision dedicated Redis containers, set custom Chronicle/Strata endpoints, and persist connection routing in tenant config.
Kubernetes NetworkPolicy templates — Default-deny ingress per tenant namespace, allow from AitherOS system namespace, allow intra-tenant, block all cross-tenant traffic.
TenantBackup — Per-tenant encrypted backup/restore. Each backup collects conversations, Spirit memory, KnowledgeGraph nodes, and config into a tarball encrypted with the tenant's HKDF-derived key. Cross-tenant backup files cannot be decrypted by other tenants.
The Testing Strategy
25 unit tests covering every gap:
- Cross-tenant trace collision: Two tenants with the same session_id. Verify sessions are isolated.
- Cross-tenant memory leak: Mock MemoryHub returning mixed-tenant results. Verify post-filter catches them.
- Platform fallback audit: Verify WARNING fires when PLATFORM_CONTEXT is used as fallback.
- Redis DB isolation: Verify deterministic mapping, range bounds, stability.
- Cross-tenant Fernet decryption: Verify tenant A's Fernet cannot decrypt tenant B's data.
InvalidTokenexpected. - Backup encryption isolation: Verify backup from tenant A cannot be restored by tenant B.
Plus a static audit script (dev/checks/check_tenant_isolation.py) that scans the codebase for tenant_id: Optional without ContextVar fallback in public endpoints, unscoped Redis keys, and data access without tenant context reference.
The Scorecard
| Subsystem | Before | After |
|---|---|---|
| CognitiveTracer | None | Tenant-keyed sessions |
| ScopedMemory | Metadata hint | Mandatory filter + post-assertion |
| OTel Traces | None | Span attributes |
| ContextVar fallback | Silent PLATFORM | Audited + strict mode |
| Redis | Key prefix (shared DB) | DB-index + optional dedicated |
| Chronicle | Partial (3/7 endpoints) | All endpoints enforced |
| KnowledgeGraph | Optional filter | Mandatory + ContextVar resolution |
| Encryption | Shared master key | Per-tenant HKDF derivation |
| Infrastructure | Shared everything | Provisioner for dedicated instances |
| Network | Shared Docker bridge | K8s NetworkPolicy templates |
| Backups | Bulk (all tenants) | Per-tenant encrypted |
149 existing tests still passing. Zero regressions. 25 new isolation tests. 14 files changed, 1,855 lines added.
What We Learned
The storage layer was fine. The query layer was the problem. Physical isolation (separate databases, filesystem directories) is bulletproof. Logical isolation (metadata hints, optional parameters) fails silently when any component in the chain drops the filter.
Defense-in-depth is the only sane approach for RAG pipelines. You cannot trust a single layer. Send the tenant filter AND verify the results. Log when verification catches something. The upstream system might be correct today and broken after the next refactor.
"Optional" parameters at API boundaries are a smell for multi-tenant systems. If a parameter determines who can see the data, it should not be optional. Resolve it from context, make it required, or fail-closed.
Observability needs tenant scoping too. It is not just about data isolation — operational telemetry (traces, logs, metrics) can reveal business intelligence about other tenants. Latency patterns, error rates, feature usage. All of it needs tenant attribution and filtering.
The gap between "works for single-tenant" and "safe for multi-tenant" is exactly the gap where AI systems are most dangerous. A traditional app leaking data shows wrong rows in a table. An AI system leaking data hallucinates confidently using someone else's proprietary knowledge, and nobody can tell the difference.
If you are building multi-tenant AI systems, audit the query layer, not just the storage layer. And assume every "hint" will be dropped.