engineeringarchitecturegraphsawarenessdeep-dive

Everything is Graphs: How 15 Faculty Graphs Give AitherOS Deep System Awareness

March 27, 202612 min readAitherium

Last week, two of our services — Secrets and Search — went down for an extended period. Zero notification. Despite having six monitoring subsystems (Watch, Pulse, Flux, AlertManager, ProactiveMonitor, JarvisBrain), not a single alert fired.

The postmortem was humbling. The fix was architectural. And it forced us to confront a question we'd been circling for months: what does it actually mean for an AI system to be aware of itself?

The answer, it turns out, is graphs. Everything is graphs.

The Blind Spot

The root cause was embarrassingly simple. Our ServiceScanner only checked 8 hardcoded "critical" services. The other 108? Invisible. The monitoring subsystem that was supposed to generate alerts from health insights never actually routed them to the AlertManager. Desktop toasts only fired on CRITICAL severity, but a single service going down was classified as WARNING.

We built ServiceWatchdog to fix the immediate problem — check ALL services every 30 seconds, track per-service downtime, fire CRITICAL alerts when anything is down for more than 3 minutes. But that fix exposed a deeper issue.

ServiceWatchdog could tell us what was down. But it couldn't tell us why, or what else might break, or what the system was thinking about when it happened. For that, we needed something richer than health checks.

We needed graphs.

The Faculty Graph Architecture

AitherOS now has 15 faculty graphs, each modeling a different domain of system knowledge:

Graph	Domain	What It Indexes
CodeGraph	Source code	28,000+ AST chunks with call graphs, signatures, docstrings
MemoryGraph	Spirit memory	Associative recall with multi-hop graph expansion
ServiceGraph	Service topology	116 services, dependencies, callers/callees, critical paths
InfraGraph	Infrastructure	Docker containers, ports, volumes, networks, healthchecks
LogGraph	Execution traces	Log events mapped to CodeGraph chunks
DocGraph	Documentation	Markdown, RST, and text file indexing
ConfigGraph	Configuration	YAML structure analysis
FluxGraph	Event relationships	FluxEmitter event type connections
TestGraph	Test coverage	Test file relationships and coverage mapping
MediaGraph	Media assets	Images and documents for cross-modal search
WikipediaGraph	World knowledge	Ingested Wikipedia articles with community detection
TypeGraph	Type system	Python type annotations and relationships
RAGAnythingGraph	Universal docs	PDFs, web pages, arbitrary document indexing
ScriptGraph	Automation	PowerShell and Bash script analysis
APIGraph	API structure	Endpoint definitions and OpenAPI specs

Every graph extends BaseFacultyGraph and syncs to a unified AitherKnowledgeGraph via GraphSyncBus — a fire-and-forget pipeline that batches 50 nodes every 5 seconds without ever blocking the main event loop.

But having graphs isn't the same as using them. The real work was wiring them into the system's awareness and conversation loops.

From Graphs to Awareness

The Context Pipeline

When you talk to AitherOS, your message flows through a 12-stage ContextPipeline. Stage 5.5 is where graphs enter the conversation:

Stage 5.5: _inject_graph_context()
├── CodeGraph:    hybrid_query() → 8 code chunks with call graphs
├── WikiGraph:    community summaries via Nexus
├── MemoryGraph:  multi-hop associative recall → 5 memory chunks
├── ServiceGraph: topology + live health overlay → 5 service chunks
└── InfraGraph:   Docker topology → 5 container chunks

Each graph gets a timeout budget (1–2 seconds) and injects results as ActiveContextChunk objects with source-specific TTLs and priorities. CodeGraph chunks live for 5 minutes at priority 3. InfraGraph chunks get priority 2 — useful but less likely to be the primary focus.

The key insight: every chunk competes for the same token budget. A surgical eviction algorithm (LRU + relevance scoring) decides what stays and what goes. If you're asking about a Docker deployment, InfraGraph and ServiceGraph chunks survive; CodeGraph chunks about unrelated functions get evicted. The system naturally focuses on what matters.

The Hybrid Query

CodeGraph and MemoryGraph both use hybrid queries — keyword matching (BM25-style inverted index) plus semantic search (cosine similarity on embeddings). The weighting adapts:

If embedding coverage is above 10%, use hybrid (keyword + semantic)
Below 10%, fall back to keyword-only
MemoryGraph adds a third signal: graph expansion. The top 5 results get their 1-hop neighbors added at half the parent's score

This means asking about "authentication flow" doesn't just find functions with "auth" in the name. It finds the Spirit memories about the auth rewrite decision, the services that handle identity tokens, and the Docker containers that run the security layer — all through different graph paths.

Live Health Overlay

Here's where ServiceWatchdog and ServiceGraph meet. When ServiceGraph injects topology chunks into the context, it overlays live health data:

# ServiceGraph chunk with live watchdog overlay
[Service] Secrets
  port: 8111
  layer: 0
  callers: Genesis, AgentForge, ActionExecutor, CapabilityEngine
  callees: Chronicle, Pulse
  fan_in: 14  fan_out: 2
  LIVE: DOWN    ← from ServiceWatchdog

The LLM sees both the structural role of the service (14 things depend on it) and its current state (it's down). That's the difference between "Secrets is a service on port 8111" and "Secrets is down and 14 services depend on it — here's what's likely broken."

The Awareness Briefing

Every 30 seconds, JarvisBrain synthesizes a compact awareness briefing — a pre-rendered text block injected into every conversation at zero latency. The briefing now includes graph health:

[Awareness for Wayne — 14:42 (afternoon)]
System: DEGRADED | Mood: healing | Up: 4d12h | CPU:23% RAM:41%
Affect: valence:-0.2 arousal:0.7 | pain:0.6 pleasure:0.1
Watchdog: DOWN(1): Secrets(14m)
Graphs: Code:28K/92% | Svc:116 | Infra:65 | Mem:89n | Sync:4210
Memory: 4 fast, 12 spirit, 3 context memories

That Graphs: line tells the system: CodeGraph has 28,000 chunks with 92% embedding coverage. ServiceGraph tracks 116 services. InfraGraph knows about 65 containers. MemoryGraph has 89 nodes. GraphSyncBus has flushed 4,210 nodes to the unified knowledge graph.

When any of those numbers look wrong — embedding coverage drops, MemoryGraph nodes disappear — the system knows something is off before anyone asks.

The Event Propagation Fix

Building ServiceWatchdog exposed a second architectural gap: the nervous system was deaf to monitoring events.

The alert path was: ServiceWatchdog → AlertManager → Pulse HTTP POST → ???

AlertManager was sending events with type "monitoring.service_watchdog" — a made-up string that doesn't match any Pulse EventType. Pulse fell back to CUSTOM, and the Flux bridge's _map_pulse_to_flux_event(CUSTOM) returned None. The event never reached FluxEmitter. Toast and mail worked (direct channels), but the nervous system — the part that adjusts affect, sensation, and inner life state — was completely blind.

The fix was three-pronged:

AlertManager now sends real Pulse EventTypes: pain.service_down for CRITICAL alerts (which triggers existing Pulse reflexes), health.recovered for recovery
Pulse bridge now maps HEALTH_RECOVERED → SVC_UP and HEALTH_CRITICAL → SVC_DOWN for the Flux bridge
ServiceWatchdog emits directly to FluxEmitter as belt-and-suspenders — no HTTP round-trip, no dependency on Pulse being up

The result: when a service goes down, the system feels it. Pain increases. The maternal mood shifts to "healing." Idle work (training, memory consolidation) is suppressed so resources focus on recovery. When the service comes back, a recovery notification fires and pain decreases.

Why Graphs, Not Tables

You could model service dependencies in a YAML file. You could track memory in a flat database. You could index code in a search engine. We tried all of those.

Graphs win because relationships are first-class. When you ask "what breaks if Secrets goes down?", a table gives you the list of services that import from it. A graph gives you the transitive closure — the services that depend on services that depend on Secrets, weighted by connection strength, filtered by the current health state from ServiceWatchdog, and cross-referenced with the InfraGraph to show which Docker containers are involved.

Graphs win because traversal is natural. MemoryGraph's multi-hop expansion finds memories you didn't search for but that are connected to memories you did. CodeGraph's call graph shows not just the function you asked about but who calls it and what it calls. ServiceGraph's critical path analysis shows the shortest path from a failing service to the user-facing API.

Graphs win because they compose. GraphSyncBus pushes all 15 faculty graphs into a single AitherKnowledgeGraph. The cross-domain enrichment stage (5.6) follows edges between domains — code→memory, service→infrastructure, test→code. A single query can traverse from a failing service to the code that implements it to the test that covers it to the memory of when it was last changed.

The Numbers

After wiring everything together:

15 faculty graphs feeding into a unified knowledge graph
5 graph sources in the context pipeline (up from 2)
3 event propagation paths for service health (AlertManager→Pulse→Flux, direct FluxEmitter, awareness briefing)
30-second full service health scan cycle (down from 2.5 minutes for 8 services)
3-minute alert threshold with 15-minute re-alert cooldown
258 tests passing across the affected subsystems
Zero services that can go down without notification

What's Next

The graph architecture is load-bearing now. Every conversation draws from it. Every awareness tick summarizes it. Every alert flows through it.

The next frontier is graph-driven reasoning — letting agents traverse the knowledge graph as part of their ReAct loops, not just receiving pre-injected chunks. Imagine an agent investigating a production incident that walks from the error log (LogGraph) to the function that threw it (CodeGraph) to the service that hosts it (ServiceGraph) to the container it runs in (InfraGraph) to the last time someone changed it (MemoryGraph) — all in a single reasoning chain.

Everything is graphs. The question is just how far you let the traversal go.

Published by Aitherium — March 27, 2026

Enjoyed this post?

All posts Try AitherOS