Closing the Loop: How AitherDirectory Became the Single Source of Truth for Everything
Six weeks ago we shipped AitherDirectory — a pure-Python LDAP-compatible directory backed by SQLite WAL. It gave us a single tree for users, groups, roles, tenants, agents, services, certificates, calendars, contacts, emails, tasks, and reminders. Every microservice could query dc=aither,dc=os instead of hunting through seven data stores.
That was the foundation. This post is about finishing the house.
The audit that started it all
We ran a full integration wiring audit across every service that touches identity, configuration, or state. The results were sobering:
| Gap | Where | Impact |
|---|---|---|
| Auth sessions | AuthSessionManager — purely in-memory Dict | Sessions lost on every restart |
| Entity registrations | AitherIdentityGate — JSON files on disk | Mesh nodes, agents, MCP clients invisible to Directory queries |
| Runtime configs | 120 YAML files in config/ | No way for agents to query configs at runtime without filesystem access |
| Relay workspace roles | AitherRelay — local role check only | No cross-reference with Directory tenant roles |
| Schema completeness | ObjectClass enum missing 3 types; 6 core schemas unregistered | Directory couldn't store configs, entities, or widgets |
| DIT skeleton | 11 OUs missing from startup provisioning | Tenant trees were incomplete |
Every one of these gaps meant the same thing: data lived outside the directory, so you couldn't answer questions about it with a single LDAP query.
We closed all of them.
Gap 1: Schema and DIT completeness
The ObjectClass enum in DirectoryStore.py had 23 members. We added three more: aitherWidget, aitherConfig, and aitherEntity. The DIT skeleton — the tree of organisational units provisioned at startup — went from 10 entries to 21, covering ou=agents, ou=configs, ou=entities, ou=mailboxes, ou=lockboxes, ou=certificates, ou=secrets, ou=policies, ou=distribution-lists, and ou=routepolicies.
More importantly, six core object classes that had attributes defined but were never actually registered in the schema registry got their registrations added: aitherUser, aitherGroup, aitherRole, aitherTenant, aitherService, and aitherDevice. Plus two new ones for the gaps we were closing: aitherConfig and aitherEntity.
Tenant sub-trees expanded too. Every tenant provisioning path — TenantSync, DirectoryClient.provision_tenant_tree, and the main AitherDirectory startup — now creates ou=configs, ou=devices, and ou=entities alongside the existing eight sub-OUs.
Result: The tree is complete. Any object type in the system has a home in the DIT.
Gap 2: ConfigDirectoryBridge — configs become queryable
We had 120 YAML files sitting in config/. AitherConfigLoader reads them on demand with in-memory caching. That works for services that import the loader — but agents running in sandboxed contexts, or external tools querying the platform, had no way to read configs.
ConfigDirectoryBridge changes that:
from lib.directory.ConfigDirectoryBridge import config_bridge
# Push a config into the directory
await config_bridge.store_config("services", {"AitherRelay": {"port": 8300}})
# Read it back
data = await config_bridge.get_config("services")
# Tenant-scoped config
await config_bridge.store_config(
"workspace_defaults",
{"max_channels": 50},
tenant_id="tnt_abc12345",
)
Each config becomes an aitherConfig entry under ou=configs,dc=aither,dc=os (or ou=configs,ou=<tenant>,ou=tenants,... for tenant-scoped configs). The JSON payload is stored in the aitherConfigData attribute. A sync_from_disk() method walks config/*.yaml and bulk-upserts everything into the tree.
Result: An agent can now run config_bridge.get_config("security.permissions") without touching the filesystem. Config changes are timestamped. Tenant-scoped overrides live right next to the tenant's other data.
Gap 3: Sessions survive restarts — SessionDirectorySync
AuthSessionManager was 174 lines of clean code with one fatal flaw: every session lived in a Python Dict. Restart the service — or worse, restart the container — and every authenticated user gets kicked out.
We didn't want to rewrite the manager. Its API is solid: create_session, touch, is_valid, revoke_session, gc_expired. So we wrapped it.
SessionDirectorySync sits on top of AuthSessionManager and mirrors every operation to Redis:
create_session→ writes a Redis hashsession:{identity_id}:{session_id}with a TTL equal to the idle timeouttouch→ updateslast_activityin Redis, refreshes the TTLrevoke_session→ deletes the Redis key- Startup →
rehydrate_from_redis()scans allsession:*keys, re-creates in-memoryAuthSessionobjects with the correctlast_activitytimestamps
The hook into AitherIdentity is a single @app.on_event("startup") function:
@app.on_event("startup")
async def _rehydrate_sessions():
from lib.directory.SessionDirectorySync import start_session_sync
await start_session_sync()
If Redis is down, the sync degrades silently to in-memory-only — the original behaviour. No crash, no blocked startup.
Result: Sessions now survive service restarts. The Redis TTL acts as a second-layer idle timeout. And if we ever need to answer "how many active sessions does user X have across the cluster?" — it's a Redis SCAN.
Gap 4: Entity registrations in the directory — EntityDirectoryBridge
AitherIdentityGate is the zero-trust entry point for AitherNet. Every mesh node, external agent, MCP client, and local AI agent must register, get approved, and receive a certificate before they can access anything. The registration records — including hardware fingerprints, code hashes, trust levels, and approval chains — are critical security data.
They were living in two JSON files: pending_registrations.json and approved_nodes.json.
EntityDirectoryBridge writes every EntityRegistration as an aitherEntity entry in ou=entities,dc=aither,dc=os:
dn: cn=local_saga_a1b2c3d4,ou=entities,dc=aither,dc=os
objectClass: aitherEntity
aitherEntityType: local_agent
aitherEntityName: Saga
aitherEntityStatus: approved
aitherTrustLevel: 3
aitherCodeHash: a1b2c3d4e5f6...
aitherAgentPort: 8770
aitherAllowedServices: ["*"]
aitherRbacRoles: ["local_agent"]
The hook is in _save_state() — every time the gate persists to JSON, it also fires off async tasks to sync each registration to the Directory:
if _entity_bridge is not None:
try:
_loop = _aio.get_running_loop()
for _reg in self._registrations.values():
_loop.create_task(_entity_bridge.sync_registration(_reg))
except RuntimeError:
pass # no event loop — skip Directory sync
The JSON files remain the primary store (IdentityGate loads from them on startup). The Directory is the queryable mirror. If you want to ask "show me all approved mesh nodes with ADMIN trust level" — that's an LDAP filter: (&(objectClass=aitherEntity)(aitherEntityType=mesh_node)(aitherEntityStatus=approved)(aitherTrustLevel=4)).
Result: Security-critical entity data is now visible in the directory tree alongside everything else. Audit queries that used to require parsing JSON files now work with standard Directory search.
Gap 5: Relay learns to ask the Directory about roles
AitherRelay has its own role system — owner, admin, member — for workspace membership. It also checks identity tokens for elevated roles like admin, super_admin, operator. But it never asked the Directory what roles a user might have in their tenant context.
We added _enrich_roles_from_directory() — a helper that queries ou=users,ou=<tenant>,ou=tenants,dc=aither,dc=os for the user's aitherRoles attribute and merges any Directory-assigned roles into the local set.
This is intentionally non-blocking and best-effort. If the Directory is down, Relay falls back to whatever roles are in the identity token — exactly the same behaviour as before. But when it is up, a user who's been granted developer in the Directory gets that role recognised in Relay without anyone having to update Relay's local state.
Result: Role assignments in the Directory propagate to workspace authorization. One place to manage roles, everywhere to enforce them.
The full picture
Here's what the AitherDirectory centralisation looks like after closing every gap:
dc=aither,dc=os
├── ou=users ← RBAC users, synced from AitherIdentity
├── ou=groups ← RBAC groups
├── ou=roles ← RBAC roles with permission sets
├── ou=agents ← AI agent personas (Saga, Athena, etc.)
├── ou=services ← Service registry entries
├── ou=tenants
│ └── ou=tnt_abc123
│ ├── ou=users
│ ├── ou=groups
│ ├── ou=roles
│ ├── ou=agents
│ ├── ou=services
│ ├── ou=calendars ← Calendar events (from Chronos)
│ ├── ou=contacts ← Contacts (auto-extracted from emails)
│ ├── ou=emails ← Email messages (sent + received)
│ ├── ou=configs ← Runtime configs (from YAML + overrides)
│ ├── ou=devices ← Mesh nodes and device registrations
│ └── ou=entities ← IdentityGate registrations
├── ou=calendars ← Global calendar entries
├── ou=contacts ← Global contacts
├── ou=emails ← Global email archive
├── ou=configs ← Global runtime configs
├── ou=entities ← Global entity registrations
├── ou=certificates ← Cert bindings
├── ou=secrets ← Secret references
├── ou=policies ← Access policies
├── ou=mailboxes ← Mailbox definitions
├── ou=lockboxes ← Encrypted storage references
├── ou=distribution-lists
├── ou=routepolicies ← Network route policies
└── ou=sessions ← Session audit entries
Every object has an aitherOwnerUid attribute. Every tenant-scoped object lives under ou=<tenant_id>,ou=tenants. Every bridge writes modifyTimestamp. The schema registry enforces required and optional attributes per object class.
What we didn't change
The source-of-truth for each subsystem is still the subsystem. AuthSessionManager still owns session logic. AitherIdentityGate still loads from JSON on startup. AitherConfigLoader still reads YAML from disk.
The Directory is the queryable mirror. Every bridge is fire-and-forget with graceful degradation. If the Directory goes down, nothing breaks — services keep running with their local stores. When it comes back up, the next write operation re-syncs.
This is a deliberate architectural choice. We don't want a single point of failure. We want a single point of query.
The numbers
| Metric | Before | After |
|---|---|---|
| Object classes in schema registry | 17 | 25 |
| DIT skeleton OUs | 10 | 21 |
| Tenant sub-OUs per tenant | 8 | 11 |
| Data stores with no Directory mirror | 4 | 0 |
| Files modified | — | 8 |
| New bridge modules | — | 3 |
| Lines of new code | — | ~650 |
| Compile errors introduced | — | 0 |
What's next
Two things are on the short list:
-
Directory-backed config overrides —
ConfigDirectoryBridgecan already store tenant-scoped configs. The next step is teachingAitherConfigLoaderto check the Directory first, fall back to disk. That gives tenants the ability to override platform defaults without touching YAML files. -
Session clustering —
SessionDirectorySyncuses Redis, which is already shared across containers. That means session rehydration works per-service. The next step is using Redis pub/sub to propagate revocations across all instances ofAitherIdentity— instant logout everywhere.
The directory tree is complete. Every identity object, every config, every session, every entity registration, every email, every calendar event — they're all in one tree, queryable with one protocol, filterable by owner and tenant.
One tree to rule them all. For real this time.