Multi-Tenant Graph Scoping: Zero-Bleed Isolation Across 23 Faculty Graphs
The Problem: 23 Graphs, Zero Isolation
<!-- transition: slide-left --> <!-- animation: fade-in --> <!-- narration: Let's start with the problem. AitherOS has over twenty-three in-process knowledge graphs. Memory graphs, code graphs, event graphs, document graphs, security graphs — all running as faculty graphs inside the same process. They're fast, they're powerful, and they were completely unscoped. Every tenant's data sat in the same flat namespace. -->AitherOS runs 23+ in-process faculty graphs. MemoryGraph stores episodic and semantic memories. CodeGraph indexes every function and class via AST. EventGraph tracks causal event chains. DocGraph, RAGGraph, WikipediaGraph, SecurityGraph — each one a specialized knowledge store powering the cognitive pipeline.
They all share one problem: no tenant isolation.
When AitherOS runs as a single-user platform, this is fine. But the moment you add tenants — SaaS customers, workspace users, multi-agent deployments — every graph becomes a data leak vector.
The durable store (AitherKnowledgeGraph service) already enforced tenant isolation with a proper _tenant_index and intersection-filtered queries. But the hot in-process caches? Completely flat. A tenant's memories were indistinguishable from any other tenant's memories.
What We Needed
<!-- layout: bullets --> <!-- transition: slide-right --> <!-- animation: stagger --> <!-- narration: We needed four things. A scope hierarchy that goes from platform down to individual user. Automatic enforcement that doesn't require every caller to pass tenant I.D. explicitly. Zero breakage of existing code. And one implementation that covers all twenty-three graphs. -->- Scope hierarchy: Platform → Tenant → Workspace → User
- Automatic enforcement: Reads scope from async context — callers don't need to change
- Zero breakage: Every existing query path must keep working unchanged
- One fix, 23 graphs: Inheritance-based — change the base class, all children are secured
Architecture
<!-- layout: diagram --> <!-- diagram: mermaid --> <!-- transition: zoom --> <!-- animation: spring --> <!-- narration: Here's the high-level architecture. At the top, every incoming request gets a Tenant Context extracted from its JWT or headers. This flows through a context variable into a Graph Scope, which carries tenant I.D., workspace I.D., and user I.D. Every faculty graph inherits scope-aware filtering from the base class. The scope level — platform, tenant, workspace, or user — determines how deep the isolation goes. -->graph TD
REQ[Incoming Request] --> JWT[JWT / Headers]
JWT --> TC[TenantContext]
TC --> GS[GraphScope]
GS --> CV[ContextVar Propagation]
CV --> BFG[BaseFacultyGraph._scope_filter]
BFG --> MG[MemoryGraph<br/>scope: tenant]
BFG --> EG[EventGraph<br/>scope: workspace]
BFG --> CG[CodeGraph<br/>scope: platform]
BFG --> DG[DocGraph<br/>scope: tenant]
BFG --> SG[SecurityGraph<br/>scope: tenant]
BFG --> MORE[... 18 more graphs]
<!-- pause: 1.0 -->
The Scope Hierarchy
<!-- transition: wipe --> <!-- animation: cascade --> <!-- narration: The scope hierarchy has four levels. Platform means no filtering at all — the operator sees everything. This is what you get when you run AitherOS on your own hardware. Tenant means data is isolated by tenant I.D. — one SaaS customer can never see another's data. Workspace adds a second dimension — within a tenant, different projects are isolated. And User goes all the way down to individual identity. -->The ScopeLevel enum defines four isolation depths:
class ScopeLevel(str, Enum):
PLATFORM = "platform" # No isolation — operator sees all
TENANT = "tenant" # Isolated by tenant_id
WORKSPACE = "workspace" # Isolated by tenant_id + workspace_id
USER = "user" # Isolated by tenant_id + workspace_id + user_id
The GraphScope dataclass carries the full hierarchy through async chains:
@dataclass
class GraphScope:
tenant_id: str = "platform"
workspace_id: str = ""
user_id: str = ""
agent_id: str = ""
def matches(self, node_meta: dict, level: ScopeLevel) -> bool:
if self.is_platform:
return True
# Platform-owned nodes are visible to everyone (shared infra)
if node_meta.get("tenant_id", "platform") == "platform":
return True
# Tenant check
if node_meta["tenant_id"] != self.tenant_id:
return False
# Workspace check (if scope level requires it)
if level in (ScopeLevel.WORKSPACE, ScopeLevel.USER):
if self.workspace_id and node_meta.get("workspace_id"):
if node_meta["workspace_id"] != self.workspace_id:
return False
return True
The key insight: platform-owned nodes (shared infrastructure) are always visible to every tenant. A tenant can see CodeGraph results for AitherOS internals, but never another tenant's private memories or documents.
The Base Class Fix
<!-- narration: This is the single most important change. Every faculty graph in AitherOS inherits from BaseFacultyGraph. We added three things to it. First, a scope level class attribute that each subclass overrides. Second, a scope filter node method that checks visibility for a single node. Third, a scope filter results method that batch-filters query results. One change here means all twenty-three graphs inherit the filtering machinery. --> <!-- layout: code --> <!-- transition: wipe --> <!-- animation: typewriter -->Every faculty graph inherits from BaseFacultyGraph. We added scope-aware filtering at this level:
class BaseFacultyGraph:
# Subclasses override this to declare isolation level
_scope_level: str = "platform" # Default: no filtering
def _scope_filter_node(self, node_meta: dict) -> bool:
"""Check if a node passes scope filtering."""
if self._scope_level == "platform":
return True
scope, level = self._get_scope() # From ContextVar
if scope is None or scope.is_platform:
return True
return scope.matches(node_meta, level)
def _scope_filter_results(self, results: list) -> list:
"""Batch-filter query results by scope."""
if self._scope_level == "platform":
return results # Fast path: no filtering
scope, level = self._get_scope()
if scope is None or scope.is_platform:
return results
return [r for r in results if scope.matches(
self._extract_meta(r), level
)]
One change. 23 graphs secured.
MemoryGraph: The Critical Case
<!-- transition: slide-left --> <!-- animation: fade-in --> <!-- narration: MemoryGraph is the most sensitive graph in the system. It stores episodic memories, semantic knowledge, procedural skills — everything the AI has learned. Before this change, hybrid query had no tenant parameter at all. It used agent I.D. and scope — which is a different concept entirely, referring to whether a memory is shared or private within a single tenant. We added tenant I.D. to the query pipeline and wired it into the eligible nodes filter. -->MemoryGraph is where tenant isolation matters most. Before:
# BEFORE: No tenant awareness
def hybrid_query(self, query, agent_id=None, scope="shared"):
eligible = self._get_eligible_nodes(...) # All tenants mixed
After:
# AFTER: Tenant-isolated queries
def hybrid_query(self, query, agent_id=None, scope="shared",
tenant_id=None):
# Auto-resolve from ContextVar when not passed
if tenant_id is None:
tenant_id = get_current_graph_scope().tenant_id
eligible = self._get_eligible_nodes(..., tenant_id=tenant_id)
The filter in _get_eligible_nodes is surgical:
# Non-platform tenants only see their own + platform-shared memories
if tenant_id and tenant_id != "platform":
node_tenant = getattr(mem, "tenant_id", "platform")
if node_tenant != "platform" and node_tenant != tenant_id:
continue # Invisible — different tenant
Platform-owned memories (system knowledge, shared procedures) remain visible to all tenants. Tenant-specific memories are walled off.
Graph Classification
<!-- layout: stats --> <!-- transition: zoom --> <!-- animation: spring --> <!-- narration: We classified all twenty-three graphs into scope levels. Eight are platform-only — they index AitherOS infrastructure like code, services, configs, and tests. These are shared knowledge that every tenant can see. Ten are tenant-scoped — memories, documents, RAG data, media, logs. Each tenant only sees their own. Two are workspace-scoped — event graphs and K.V. cache graphs are isolated even further, down to the workspace level within a tenant. -->Platform (8 graphs): CodeGraph, ServiceGraph, InfraGraph, ConfigGraph, ScriptGraph, TestGraph, TypeGraph, APIGraph
Tenant-scoped (10 graphs): MemoryGraph, DocGraph, RAGGraph, StrataGraph, WikipediaGraph, MediaGraph, DirectoryGraph, FluxGraph, LogGraph, SecurityGraph
Workspace-scoped (2 graphs): EventGraph, KVCacheGraph
Pipeline Wiring: Automatic Propagation
<!-- transition: slide-right --> <!-- animation: cascade --> <!-- narration: The beautiful part is how little the pipeline callers had to change. The Agent Runtime reads the graph scope once at the start of a faceted run and passes it to the Tiered Context Assembler. The assembler passes it to every hybrid query call. But even callers that don't pass it explicitly still get scoped — because the ContextVar auto-resolves from the tenant context. Existing code works unchanged. -->The pipeline propagation required minimal changes:
# AgentRuntime._faceted_run() — reads scope once
from lib.core.AitherTenant import get_current_graph_scope
tenant_id = get_current_graph_scope().tenant_id
# Passes to TieredContextAssembler
assembler = TieredContextAssembler(
session_id=session_id,
effort_level=effort,
tenant_id=tenant_id, # NEW — propagated to all graph queries
)
Callers that don't pass tenant_id explicitly? They still get scoped. hybrid_query() auto-resolves from the ContextVar:
if tenant_id is None:
tenant_id = get_current_graph_scope().tenant_id
Zero existing callers needed changes. The ContextVar propagation does the work.
The Enforcement Stack
<!-- transition: flip --> <!-- animation: stagger --> <!-- narration: Isolation isn't just one layer. It's five services working together. AitherIdentity extracts the tenant from the JWT. AitherTenant propagates it through context variables. The Graph Scope filters every query. AitherStrata scopes storage paths. And AitherFlux scopes event channels with tenant-prefixed names. If you're a tenant, you physically cannot see another tenant's data — the filters are applied at every layer. -->Isolation is enforced by five services working in concert:
| Service | Role |
|---|---|
| AitherIdentity | Extracts tenant from JWT → TenantContext |
| AitherTenant | Propagates TenantContext + GraphScope via ContextVars |
| BaseFacultyGraph | Filters every graph query by scope level |
| AitherStrata | Scopes storage paths: tenants/{slug}/ |
| AitherFlux | Scopes event channels: {slug}:event_name |
Non-local requests without a valid tenant are fail-closed to PUBLIC — never PLATFORM. This is enforced in resolve_full_caller_context().
Design Decisions & Trade-offs
<!-- transition: slide-left --> <!-- animation: cascade --> <!-- narration: A few key design decisions. First, platform-owned nodes are visible to all tenants. This means shared infrastructure knowledge — like how AitherOS services work — is available to everyone. A tenant can ask about CodeGraph, but never see another tenant's memories. Second, we chose ContextVar auto-resolution over mandatory parameters. This means existing code works unchanged, but it also means scope is implicit. We accept this trade-off because the alternative — refactoring every graph caller — would have been a months-long project with high regression risk. Third, the scope level is a class attribute, not an instance attribute. This is deliberate: a MemoryGraph is always tenant-scoped, everywhere, always. You can't accidentally create an unscoped instance. -->- Platform nodes visible to all: Shared infrastructure knowledge (CodeGraph, ServiceGraph) is accessible to every tenant. A tenant can query how AitherOS services work, but never see another tenant's memories.
- ContextVar over mandatory params: Scope is resolved implicitly from the async context. Existing callers work unchanged. The trade-off is implicit behavior — but the alternative (refactoring every caller) would have taken months.
- Class-level scope declaration:
_scope_levelis a class attribute, not instance. A MemoryGraph is always tenant-scoped. You can't create an unscoped instance by accident. - Fail-closed defaults: No scope context → treated as platform (backwards compatible for self-hosted). Non-local requests without tenant → forced to PUBLIC (never PLATFORM).
What's Next
<!-- transition: slide-up --> <!-- animation: stagger --> <!-- narration: This is the foundation. Next, we're building graph-level capability gates in capabilities dot yaml — so RBAC can control which tenants can even access which graph types. We're adding workspace-level isolation to more graphs as workspace semantics mature. And we're building a scope audit tool that can verify, at any point, that no cross-tenant data is leaking through any graph query path. -->- Capability gates: Add graph-level RBAC in
capabilities.yaml— control which plan tiers can access which graphs - Workspace isolation expansion: More graphs moving from TENANT to WORKSPACE as workspace semantics mature
- Scope audit tooling: Automated verification that no cross-tenant data leaks through any query path
- Neuron scoping: Wire GraphScope into NeuronFire so speculative prefetch respects tenant boundaries
<!-- layout: closing --> <!-- transition: zoom --> <!-- animation: spring --> <!-- narration: That's the full story. One base class change. Twenty-three graphs secured. Zero callers broken. Multi-tenant isolation that goes from platform to tenant to workspace to user — enforced automatically through async context propagation. The key insight? Scope filtering belongs in the base class, not in every caller. Build the fence once, every graph inherits it. Thanks for watching. --> <!-- pause: 2.0 -->
One base class. 23 graphs. Zero bleed. That's how you build multi-tenant isolation without breaking everything.