Back to blog
engineeringarchitecturetraining

AitherGraph + ToolGraph: Where Knowledge Meets Intuition

March 3, 202610 min readDavid Parkhurst

AI agents are only as smart as what they can remember and what they can reach. Most agent frameworks solve neither problem well — they stuff the entire tool catalog into every prompt and rely on RAG for memory. The result: bloated context windows, slow inference, and agents that forget what happened five minutes ago.

AitherOS takes a different approach. AitherGraph gives agents a unified knowledge graph spanning code, social relationships, concepts, and long-term memory. ToolGraph teaches agents which tools to reach for before the LLM even sees the prompt. Together, they turn the OS from a system that reacts into one that anticipates.

AitherGraph: Five Domains, One Service

AitherGraph (port 8196, Layer 4 Memory) is an in-memory graph database with disk persistence that unifies five knowledge domains under a single API. Every node and edge lives in RAM for microsecond lookups, with periodic JSON serialization for crash recovery.

Domain 1: Code

Files, classes, functions, methods, and modules — connected by calls,imports, inherits, and contains edges. When Demiurge refactors a module, it doesn't grep. It queries the graph for all callers, all dependents, and the exact blast radius of the change. Code queries delegate to CodeGraph (port 8153) for deep AST analysis, but the unified API means consumers never need to know.

Domain 2: Social

Users, posts, comments, personas, and submolts form the social layer. Every interaction on MoltBook or Bluesky is recorded as a typed edge: helped, agreed_with,attacked, befriended. When Aeon is about to reply to someone, it asks the graph: “What do I know about this person? Have we interacted before? Are they an ally or hostile?” The answer arrives as a pre-formatted context string, ready for prompt injection.

Domain 3: Concepts

Topics, entities, and skills form a learned knowledge web. As agents converse and research, the KnowledgeIngester extracts entities via FluxEmitter events and writes them as graph nodes with related_to, part_of, and expert_in edges. Over time, the concept graph becomes a compressed map of everything the system has ever learned about.

Domain 4: Memory

This is where things get biological. Four memory types live in the graph, all backed by Nexus (port 8122) vector embeddings for hybrid graph+semantic retrieval:

  • Episodic — Personal experiences. “Had a great conversation about astronomy with @dave.”
  • Semantic — Facts and knowledge. “The user prefers dark mode.”
  • Procedural — How-to knowledge. “To deploy: git push origin main.”
  • Identity — Core self-knowledge. Hardware specs, OS version, agent names — ingested from the system fingerprint.

Memories link to each other via temporal chains (preceded_by), associative recall (reminded_of), contradiction detection (contradicts), and knowledge extraction chains (derived_from). This isn't a flat vector store. It's a graph where memories reinforce, contradict, and synthesize each other — the way biological memory actually works.

Domain 5: Continual Learning

Compressed insights from conversations. When a session ends, the KnowledgeIngester distills it into a learning node. Multiple learnings can be synthesized into higher-order synthesis nodes via synthesized_into edges. This is the TTT-inspired (Test-Time Training) pathway — the graph literally gets smarter over time without retraining any model weights.

The Architecture Under the Hood

The core KnowledgeGraph class maintains four in-memory indexes for O(1) lookups: edges-by-source, edges-by-target, edges-by-type, and nodes-by-type. On top of that sits an inverted keyword index with stop-word filtering and a 128-entry LRU search cache. Semantic search is powered by the Mind service's 768-dimensional embeddings.

For crash recovery, AitherGraph implements a log-first architecture via EventLog. Every mutation (node created, edge added, node removed) is recorded as a sequenced event. On restart, the graph replays from its last checkpoint, then subscribes for live updates. This is the same state-machine replication pattern used by databases like CockroachDB — applied to a knowledge graph.

The API surface spans 38 endpoints across six groups: core CRUD, social graph, code context, log intelligence, memory operations, and system fingerprinting. Every other service in AitherOS — from SixPillars to ChatEngine to the MCP tools — reads from or writes to this single service.

ToolGraph: Teaching Agents What to Reach For

AitherGraph solves the knowledge problem. ToolGraph solves the action problem.

As our agent toolkit grew to 24+ tools (file operations, graph queries, code search, image generation, voice synthesis, swarm coding, mesh communication), we hit a wall: passing all tool schemas to the LLM on every ReAct turn costs ~12,000 tokens. Over a 20-turn session, that's 240K tokens burned just describing tools the agent will never use.

ToolGraph (lib/core/ToolGraph.py) replaces this with a three-tier selection system that pre-selects only the relevant tools for each task:

Tier 1: NanoGPT Predictor (<1ms)

A tiny NanoGPT (1 layer, 32 dimensions, 4 attention heads) trained on real tool usage data. For every task, it evaluates “task\ntool_name” pairs and ranks tools by loss — low loss means the model has seen that task-tool combination before and considers it natural. This is the same loss-as-anomaly-score trick we use for OS proprioception, applied to tool selection.

Tier 2: Hybrid Search (~10–150ms)

TF-IDF keyword matching over tool names, descriptions, and parameter names, combined with 768-dimensional cosine similarity via the EmbeddingEngine. Keyword scores get 40% weight, semantic scores get 60%. This handles cases the NanoGPT hasn't seen yet — novel queries that don't match any training data but have clear semantic overlap with tool descriptions.

Tier 3: Full Schema (fallback)

If neither tier produces confident results, fall back to the current behavior — send all tools. This ensures ToolGraph is purely additive. Worst case, you get exactly what you had before.

Training NanoGPT on Tool Calls

The training data pipeline is straightforward. Every time an agent completes a task,record_usage(task, tools_used) appends a JSONL record toLibrary/Training/tool_usage/sessions.jsonl:

{"task": "Search memory for past decisions about auth", "tools_used": ["graph_query", "code_search"], "timestamp": 1709472000}

Once 50+ sessions accumulate, train_predictor() expands each record into task-tool pairs, trains the NanoGPT for 300 steps, and activates Tier 1 selection. Training emits a TOOL_GRAPH_TRAIN FluxEmitter event so the rest of the system knows the predictor has been updated.

The key design choice: the NanoGPT is deliberately tiny. It runs on CPU in under 1ms. It doesn't need to understand the task deeply — it just needs to learn statistical co-occurrence patterns between task descriptions and tool names. “When the user mentions ‘image’ or ‘generate’, reach forgenerate_image_comfy.” That's pattern matching, not reasoning, and a 32-dimensional transformer is more than enough.

Runtime Integration: The AgentRuntime Wiring

ToolGraph isn't a standalone service — it's wired directly into the AgentRuntime's ReAct loop. Before the first LLM call, _preselect_tools()queries ToolGraph and swaps the active tool registry to a subset. The LLM only sees 8 tools instead of 24+.

But agents are unpredictable. What if the LLM requests a tool that wasn't in the pre-selected set? ToolGraph handles this with auto-promotion: if the requested tool exists in the full registry (_all_tools), it's silently promoted into the active set and executed. No error, no retry. The agent never knows its toolbox was filtered.

There's also a tool_search meta-tool — the agent can explicitly ask “what tools are available for image generation?” and ToolGraph will return relevant matches with descriptions. This gives the LLM a way to discover tools it doesn't currently have access to, without stuffing the entire catalog into every prompt.

The Numbers

MetricBefore ToolGraphAfter ToolGraph
Tool tokens per LLM call~12,000~3,200
20-turn session total~240,000~64,000
Token savings73%
Selection latency (NanoGPT)N/A<1ms
Selection latency (hybrid)N/A~10–150ms
Tests80/80 passing

What's Next: The Self-Improving Loop

ToolGraph is designed as a flywheel. Every agent session generates training data. More training data improves the NanoGPT predictor. Better predictions mean fewer wasted tokens, faster inference, and more accurate tool selection — which generates cleaner training data. The system gets better at knowing what it needs just by being used.

The next step is connecting ToolGraph to AitherGraph's continual learning pipeline. When an agent discovers a novel task-tool combination that works well, that knowledge should flow into the concept graph as a learning node — so other agents can benefit from the discovery without waiting for the NanoGPT to retrain. Tool intuition, shared across the collective.

AitherGraph gives agents knowledge. ToolGraph gives them intuition. Together, they make the difference between an agent that has to think about everything and one that justknows.