Everything is Graphs: How 15 Faculty Graphs Give AitherOS Deep System Awareness
Last week, two of our services — Secrets and Search — went down for an extended period. Zero notification. Despite having six monitoring subsystems (Watch, Pulse, Flux, AlertManager, ProactiveMonitor, JarvisBrain), not a single alert fired.
The postmortem was humbling. The fix was architectural. And it forced us to confront a question we'd been circling for months: what does it actually mean for an AI system to be aware of itself?
The answer, it turns out, is graphs. Everything is graphs.
The Blind Spot
The root cause was embarrassingly simple. Our ServiceScanner only checked 8 hardcoded "critical" services. The other 108? Invisible. The monitoring subsystem that was supposed to generate alerts from health insights never actually routed them to the AlertManager. Desktop toasts only fired on CRITICAL severity, but a single service going down was classified as WARNING.
We built ServiceWatchdog to fix the immediate problem — check ALL services every 30 seconds, track per-service downtime, fire CRITICAL alerts when anything is down for more than 3 minutes. But that fix exposed a deeper issue.
ServiceWatchdog could tell us what was down. But it couldn't tell us why, or what else might break, or what the system was thinking about when it happened. For that, we needed something richer than health checks.
We needed graphs.
The Faculty Graph Architecture
AitherOS now has 15 faculty graphs, each modeling a different domain of system knowledge:
| Graph | Domain | What It Indexes |
|---|---|---|
| CodeGraph | Source code | 28,000+ AST chunks with call graphs, signatures, docstrings |
| MemoryGraph | Spirit memory | Associative recall with multi-hop graph expansion |
| ServiceGraph | Service topology | 116 services, dependencies, callers/callees, critical paths |
| InfraGraph | Infrastructure | Docker containers, ports, volumes, networks, healthchecks |
| LogGraph | Execution traces | Log events mapped to CodeGraph chunks |
| DocGraph | Documentation | Markdown, RST, and text file indexing |
| ConfigGraph | Configuration | YAML structure analysis |
| FluxGraph | Event relationships | FluxEmitter event type connections |
| TestGraph | Test coverage | Test file relationships and coverage mapping |
| MediaGraph | Media assets | Images and documents for cross-modal search |
| WikipediaGraph | World knowledge | Ingested Wikipedia articles with community detection |
| TypeGraph | Type system | Python type annotations and relationships |
| RAGAnythingGraph | Universal docs | PDFs, web pages, arbitrary document indexing |
| ScriptGraph | Automation | PowerShell and Bash script analysis |
| APIGraph | API structure | Endpoint definitions and OpenAPI specs |
Every graph extends BaseFacultyGraph and syncs to a unified AitherKnowledgeGraph via GraphSyncBus — a fire-and-forget pipeline that batches 50 nodes every 5 seconds without ever blocking the main event loop.
But having graphs isn't the same as using them. The real work was wiring them into the system's awareness and conversation loops.
From Graphs to Awareness
The Context Pipeline
When you talk to AitherOS, your message flows through a 12-stage ContextPipeline. Stage 5.5 is where graphs enter the conversation:
Stage 5.5: _inject_graph_context()
├── CodeGraph: hybrid_query() → 8 code chunks with call graphs
├── WikiGraph: community summaries via Nexus
├── MemoryGraph: multi-hop associative recall → 5 memory chunks
├── ServiceGraph: topology + live health overlay → 5 service chunks
└── InfraGraph: Docker topology → 5 container chunks
Each graph gets a timeout budget (1–2 seconds) and injects results as ActiveContextChunk objects with source-specific TTLs and priorities. CodeGraph chunks live for 5 minutes at priority 3. InfraGraph chunks get priority 2 — useful but less likely to be the primary focus.
The key insight: every chunk competes for the same token budget. A surgical eviction algorithm (LRU + relevance scoring) decides what stays and what goes. If you're asking about a Docker deployment, InfraGraph and ServiceGraph chunks survive; CodeGraph chunks about unrelated functions get evicted. The system naturally focuses on what matters.
The Hybrid Query
CodeGraph and MemoryGraph both use hybrid queries — keyword matching (BM25-style inverted index) plus semantic search (cosine similarity on embeddings). The weighting adapts:
- If embedding coverage is above 10%, use hybrid (keyword + semantic)
- Below 10%, fall back to keyword-only
- MemoryGraph adds a third signal: graph expansion. The top 5 results get their 1-hop neighbors added at half the parent's score
This means asking about "authentication flow" doesn't just find functions with "auth" in the name. It finds the Spirit memories about the auth rewrite decision, the services that handle identity tokens, and the Docker containers that run the security layer — all through different graph paths.
Live Health Overlay
Here's where ServiceWatchdog and ServiceGraph meet. When ServiceGraph injects topology chunks into the context, it overlays live health data:
# ServiceGraph chunk with live watchdog overlay
[Service] Secrets
port: 8111
layer: 0
callers: Genesis, AgentForge, ActionExecutor, CapabilityEngine
callees: Chronicle, Pulse
fan_in: 14 fan_out: 2
LIVE: DOWN ← from ServiceWatchdog
The LLM sees both the structural role of the service (14 things depend on it) and its current state (it's down). That's the difference between "Secrets is a service on port 8111" and "Secrets is down and 14 services depend on it — here's what's likely broken."
The Awareness Briefing
Every 30 seconds, JarvisBrain synthesizes a compact awareness briefing — a pre-rendered text block injected into every conversation at zero latency. The briefing now includes graph health:
[Awareness for Wayne — 14:42 (afternoon)]
System: DEGRADED | Mood: healing | Up: 4d12h | CPU:23% RAM:41%
Affect: valence:-0.2 arousal:0.7 | pain:0.6 pleasure:0.1
Watchdog: DOWN(1): Secrets(14m)
Graphs: Code:28K/92% | Svc:116 | Infra:65 | Mem:89n | Sync:4210
Memory: 4 fast, 12 spirit, 3 context memories
That Graphs: line tells the system: CodeGraph has 28,000 chunks with 92% embedding coverage. ServiceGraph tracks 116 services. InfraGraph knows about 65 containers. MemoryGraph has 89 nodes. GraphSyncBus has flushed 4,210 nodes to the unified knowledge graph.
When any of those numbers look wrong — embedding coverage drops, MemoryGraph nodes disappear — the system knows something is off before anyone asks.
The Event Propagation Fix
Building ServiceWatchdog exposed a second architectural gap: the nervous system was deaf to monitoring events.
The alert path was: ServiceWatchdog → AlertManager → Pulse HTTP POST → ???
AlertManager was sending events with type "monitoring.service_watchdog" — a made-up string that doesn't match any Pulse EventType. Pulse fell back to CUSTOM, and the Flux bridge's _map_pulse_to_flux_event(CUSTOM) returned None. The event never reached FluxEmitter. Toast and mail worked (direct channels), but the nervous system — the part that adjusts affect, sensation, and inner life state — was completely blind.
The fix was three-pronged:
- AlertManager now sends real Pulse EventTypes:
pain.service_downfor CRITICAL alerts (which triggers existing Pulse reflexes),health.recoveredfor recovery - Pulse bridge now maps
HEALTH_RECOVERED→SVC_UPandHEALTH_CRITICAL→SVC_DOWNfor the Flux bridge - ServiceWatchdog emits directly to FluxEmitter as belt-and-suspenders — no HTTP round-trip, no dependency on Pulse being up
The result: when a service goes down, the system feels it. Pain increases. The maternal mood shifts to "healing." Idle work (training, memory consolidation) is suppressed so resources focus on recovery. When the service comes back, a recovery notification fires and pain decreases.
Why Graphs, Not Tables
You could model service dependencies in a YAML file. You could track memory in a flat database. You could index code in a search engine. We tried all of those.
Graphs win because relationships are first-class. When you ask "what breaks if Secrets goes down?", a table gives you the list of services that import from it. A graph gives you the transitive closure — the services that depend on services that depend on Secrets, weighted by connection strength, filtered by the current health state from ServiceWatchdog, and cross-referenced with the InfraGraph to show which Docker containers are involved.
Graphs win because traversal is natural. MemoryGraph's multi-hop expansion finds memories you didn't search for but that are connected to memories you did. CodeGraph's call graph shows not just the function you asked about but who calls it and what it calls. ServiceGraph's critical path analysis shows the shortest path from a failing service to the user-facing API.
Graphs win because they compose. GraphSyncBus pushes all 15 faculty graphs into a single AitherKnowledgeGraph. The cross-domain enrichment stage (5.6) follows edges between domains — code→memory, service→infrastructure, test→code. A single query can traverse from a failing service to the code that implements it to the test that covers it to the memory of when it was last changed.
The Numbers
After wiring everything together:
- 15 faculty graphs feeding into a unified knowledge graph
- 5 graph sources in the context pipeline (up from 2)
- 3 event propagation paths for service health (AlertManager→Pulse→Flux, direct FluxEmitter, awareness briefing)
- 30-second full service health scan cycle (down from 2.5 minutes for 8 services)
- 3-minute alert threshold with 15-minute re-alert cooldown
- 258 tests passing across the affected subsystems
- Zero services that can go down without notification
What's Next
The graph architecture is load-bearing now. Every conversation draws from it. Every awareness tick summarizes it. Every alert flows through it.
The next frontier is graph-driven reasoning — letting agents traverse the knowledge graph as part of their ReAct loops, not just receiving pre-injected chunks. Imagine an agent investigating a production incident that walks from the error log (LogGraph) to the function that threw it (CodeGraph) to the service that hosts it (ServiceGraph) to the container it runs in (InfraGraph) to the last time someone changed it (MemoryGraph) — all in a single reasoning chain.
Everything is graphs. The question is just how far you let the traversal go.
Published by Aitherium — March 27, 2026