Neurons: How AitherOS Thinks Before You Ask
Every AI agent you've ever used has the same problem: you ask a question, it searches for the answer, you wait, it responds. The latency isn't in the model — it's in the context gathering. The agent needs to search your codebase, check documentation, recall memories, query system status, and maybe hit the web. All of that happens after you press enter.
What if it happened before?
AitherOS has a neuron system. Not a metaphor — actual autonomous background workers that fire in response to stimuli, proactively gathering information and caching it so that when the agent synthesizes a response, the data is already there. No tool calls. No "searching..." spinners. The agent just knows.
Here's how it works.
What Is a Neuron?
A neuron is a small, focused worker that does exactly one thing: take a query, search a specific data source, and return a result. That's it. No LLM calls, no complex chains, no orchestration. Fire and return.
AitherOS has 42 neuron types, organized into three categories:
Passive neurons (read-only, gather information):
GREP— regex search across the codebaseCODE/CODEGRAPH— symbol lookup with AST-based call graphsWEB— DuckDuckGo search for documentation and newsMEMORY— Spirit memory recall (episodic + semantic)DOC/DOCGRAPH— documentation structure with cross-referencesHEALTH— service health checks across all running servicesCONFIGGRAPH— YAML/JSON/TOML configuration topologySERVICEGRAPH— service dependency mappingARCHITECTURE— system architecture knowledge with decay- Plus 10 more specialized graph neurons (API, test coverage, TypeScript, infrastructure, Flux events, PowerShell scripts)
Effector neurons (take actions):
CANVAS— generate images via ComfyUICODEGEN— generate code via DemiurgeFILEWRITE/FILEEDIT— write or surgically edit filesEXECUTE— run shell commands safelyDEPLOY— deploy artifacts to services
Meta-cognitive neurons:
RLM— Recursive Language Model deep context analysisSUMMARY— content summarizationJUDGE— content evaluation and scoringAUDIT— validate claims against reality
Each neuron has a cache, a TTL, a confidence score, and a cost budget. The system knows how expensive each neuron is and can throttle accordingly.
The Three Triggers
Neurons fire from three independent stimulus paths. This is the key design insight — they don't wait for a single trigger. They react to everything.
1. Typing Detection (Speculative Prefetch)
The frontend sends typing hints every few keystrokes as a fire-and-forget request. An intent predictor classifies the partial input -- is this a code question? A status check? A research query? -- and pre-fires the most relevant neurons.
User starts typing: "how do I fix the auth..."
-> Frontend sends partial: "how do I fix the auth"
-> Intent predictor classifies: CODE intent
-> Pre-fires: code, grep, codegraph, memory neurons
-> Results cached with 120s TTL
-> User submits full prompt
-> Context assembly: cache HIT
-> Agent responds instantly with pre-gathered context
The speculation runs at most once every 3 seconds and costs nothing if the cache expires unused. But when it hits — which is often, because most prompts are continuations of what you were typing — the response feels instant.
2. Prompt Submission (Priority-Tiered Parallel Execution)
When the user actually submits a prompt, the context assembly pipeline runs a multi-stage process. One of those stages is ENRICH -- where the parallel context orchestrator fires neurons in priority tiers.
A scaling system decides how many neurons to fire based on query complexity:
| Query Type | Neuron Count |
|---|---|
| Greeting ("hey") | 0 |
| Simple fact | 2 |
| Code question | 8 |
| Multi-domain | 16 |
| Research | 24 |
| Complex research | 32 |
If the typing-detection cache already has >60% of needed sources, the count is halved. No wasted work.
The parallel context orchestrator then executes neurons in tiered priority groups. For a CODE intent:
- Tier 1 (fire immediately):
code,grep,file,index,codegraph,apigraph - Tier 2 (fire next):
memory,doc,docgraph,architecture,gpu,typegraph - Tier 3 (fire if needed):
web,search,system,testgraph
Early return kicks in when confidence hits 0.75 -- if Tier 1 already found everything, Tier 2 never fires. Progressive aggregation streams results as they arrive and deduplicates on-the-fly.
The really clever bit: the orchestrator splits sources into fast and slow paths. Fast sources (memory, persona, code graphs, local state) block for up to 5 seconds total -- the LLM needs them. Slow sources (web search, architecture analysis, multi-step reasoning) fire as background tasks and inject results via callbacks into the active context cache as they complete. The LLM starts generating immediately with fast context, and slow context arrives mid-generation if it's fast enough.
3. System Events (The New Bridge)
This is what we just built. A bridge subscribes to the AitherOS event bus -- the nervous system of the OS -- and forwards relevant events to the neuron daemon.
When a service goes down? Health and cluster neurons fire automatically. When code is committed? Code, index, and codegraph neurons re-index. When a conversation happens? Memory and semantic neurons refresh. When an agent executes a tool? Related neurons fire to keep context fresh.
Here's the routing table:
| Flux Event | Neurons Fired |
|---|---|
Service down (svc.d) | health, cluster |
| Code change | code, index, codegraph |
| Git commit | code, index, codegraph |
| Conversation exchange | conversation, memory, semantic |
| GPU model load | gpu |
| Mesh node join | mesh, cluster |
| Agent writes a file | code, index |
| Agent searches web | web |
| Agent stores memory | memory, semantic |
Every fire is throttled (5s minimum per neuron type) and telemetry is pushed to Strata every 60 seconds for analysis.
The Neuron Daemon
Behind all three trigger paths sits the neuron daemon -- a background process that runs scheduled and event-driven neuron fires. It has four firing modes:
- TIMER -- Health every 30s, cluster every 60s, index every 5m, architecture every 10m
- EVENT -- Conversation messages, memory stores, service state changes
- PREEMPTIVE -- Based on intent prediction analysis of conversation trajectory
- ON-DEMAND -- Explicit fire via MCP tools
The daemon maintains a cache hierarchy with per-source TTLs:
| Source | TTL |
|---|---|
| Axioms, will, persona | 1 hour |
| Spirit, memory, code, codegraph | 5 minutes |
| Web, conversation | 2 minutes |
| Affect (emotional state) | 30 seconds |
| Health, flux | 15 seconds |
"Hot topics" — things the user keeps asking about — get faster refresh intervals. If you've mentioned "authentication" three times in the last five minutes, the daemon will refresh code and memory neurons for that topic every 2 minutes instead of every 5.
How Results Flow Back
Neuron results don't just float in a void. They follow a structured path into the agent's context window:
- Neuron fires and returns content, sources, and a confidence score
- Context orchestrator wraps the result with relevance, priority, and dedup metadata
- Pipeline bridges add TTL, grouping, and a composite score
- Active context cache stores the result -- scored, evictable, surgically managed
- Rendered to system prompt -- priority-ordered, token-budgeted
The active context cache uses surgical eviction, not truncation. Each chunk has a score:
score = relevance * priority_weight * freshness_decay + access_boost
When the cache exceeds its token budget, it evicts the lowest-scored chunks first. Axioms (priority 5) are never evicted. If one chunk in a group goes, all chunks in that group go — because partial results are worse than no results.
Auto-Fire: The Subconscious
The auto-fire system is the simplest and most magical piece. It's 43 regex patterns that match against the user's query and inject real-time data before the LLM even sees the prompt.
User: "What time is it?"
Without auto-fire:
AI thinks: "I need to call get_current_time tool..."
AI outputs: "[TOOL_CALL: get_current_time]"
System executes tool, returns time
AI outputs: "The current time is 2:30 PM"
With auto-fire:
System detects time query, injects time BEFORE LLM
AI immediately knows: "The current time is 2:30 PM"
No thinking needed — instant response.
This works for system status, cluster topology, git state, service health, code context, documentation structure, configuration topology, and a dozen other data types. The AI doesn't decide to look things up — it already knows because the data was injected into its context before it started generating.
What Changed Today
Before today, the neuron system was architecturally 90% complete but only 60% wired. The neuron daemon existed but only started in a separate service, not in the system brain. The event bus wasn't connected to neuron firing. Tool execution didn't trigger context refreshes.
We fixed all four gaps:
-
Boot integration -- The neuron daemon now starts during system initialization, right alongside the chat engine, scheduler, and awareness loop. When the system is up, neurons are firing.
-
Event bus bridge -- A new bridge polls recent system events and routes them to the daemon. Service goes down? Neurons know. Code changes? Neurons re-index. All with per-type throttling to prevent spam.
-
Post-tool refresh -- After every successful tool execution, the bridge fires relevant neurons as fire-and-forget tasks. Write a file? Code and index neurons refresh. Search the web? Web neuron caches results. Store a memory? Semantic neurons update.
-
Telemetry -- Every 60 seconds, the bridge pushes hit/miss rates, per-neuron-type breakdowns, and fire counts to the analytics system. This creates the feedback loop needed to optimize which neurons to pre-fire and when.
94 tests verify all of it.
The Philosophy
The neuron system embodies a specific philosophy: the best tool call is the one that never happens.
Every time an agent pauses to think "I should search for that," you feel it. The conversation stutters. The response takes longer. The agent seems less intelligent — not because the model is worse, but because the context gathering is visible.
Neurons hide the work. They fire autonomously, in the background, in response to stimuli that predict what the agent will need. The agent doesn't search — it already knows. The agent doesn't look up system status — it was injected 15 seconds ago. The agent doesn't check if the file exists — the codegraph was refreshed when the file was written.
This is what it means to have a subconscious. Not a faster conscious process — a parallel, autonomous, always-running system that makes the conscious process feel effortless.
42 neuron types. Three trigger paths. Background daemon. Surgical cache eviction. And the user just sees: fast, accurate, contextual responses.
That's neurons.