Every AI agent you've ever used has the same problem: you ask a question, it searches for the answer, you wait, it responds. The latency isn't in the model — it's in the context gathering. The agent needs to search your codebase, check documentation, recall memories, query system status, and maybe hit the web. All of that happens after you press enter.

What if it happened before?

AitherOS has a neuron system. Not a metaphor — actual autonomous background workers that fire in response to stimuli, proactively gathering information and caching it so that when the agent synthesizes a response, the data is already there. No tool calls. No "searching..." spinners. The agent just knows.

Here's how it works.

What Is a Neuron?

A neuron is a small, focused worker that does exactly one thing: take a query, search a specific data source, and return a result. That's it. No LLM calls, no complex chains, no orchestration. Fire and return.

AitherOS has 42 neuron types, organized into three categories:

Passive neurons (read-only, gather information):

GREP — regex search across the codebase
CODE / CODEGRAPH — symbol lookup with AST-based call graphs
WEB — DuckDuckGo search for documentation and news
MEMORY — Spirit memory recall (episodic + semantic)
DOC / DOCGRAPH — documentation structure with cross-references
HEALTH — service health checks across all running services
CONFIGGRAPH — YAML/JSON/TOML configuration topology
SERVICEGRAPH — service dependency mapping
ARCHITECTURE — system architecture knowledge with decay
Plus 10 more specialized graph neurons (API, test coverage, TypeScript, infrastructure, Flux events, PowerShell scripts)

Effector neurons (take actions):

CANVAS — generate images via ComfyUI
CODEGEN — generate code via Demiurge
FILEWRITE / FILEEDIT — write or surgically edit files
EXECUTE — run shell commands safely
DEPLOY — deploy artifacts to services

Meta-cognitive neurons:

RLM — Recursive Language Model deep context analysis
SUMMARY — content summarization
JUDGE — content evaluation and scoring
AUDIT — validate claims against reality

Each neuron has a cache, a TTL, a confidence score, and a cost budget. The system knows how expensive each neuron is and can throttle accordingly.

The Three Triggers

Neurons fire from three independent stimulus paths. This is the key design insight — they don't wait for a single trigger. They react to everything.

1. Typing Detection (Speculative Prefetch)

The frontend sends typing hints every few keystrokes as a fire-and-forget request. An intent predictor classifies the partial input -- is this a code question? A status check? A research query? -- and pre-fires the most relevant neurons.

User starts typing: "how do I fix the auth..."
  -> Frontend sends partial: "how do I fix the auth"
  -> Intent predictor classifies: CODE intent
  -> Pre-fires: code, grep, codegraph, memory neurons
  -> Results cached with 120s TTL
  -> User submits full prompt
  -> Context assembly: cache HIT
  -> Agent responds instantly with pre-gathered context

The speculation runs at most once every 3 seconds and costs nothing if the cache expires unused. But when it hits — which is often, because most prompts are continuations of what you were typing — the response feels instant.

2. Prompt Submission (Priority-Tiered Parallel Execution)

When the user actually submits a prompt, the context assembly pipeline runs a multi-stage process. One of those stages is ENRICH -- where the parallel context orchestrator fires neurons in priority tiers.

A scaling system decides how many neurons to fire based on query complexity:

Query Type	Neuron Count
Greeting ("hey")	0
Simple fact	2
Code question	8
Multi-domain	16
Research	24
Complex research	32

If the typing-detection cache already has >60% of needed sources, the count is halved. No wasted work.

The parallel context orchestrator then executes neurons in tiered priority groups. For a CODE intent:

Tier 1 (fire immediately): code, grep, file, index, codegraph, apigraph
Tier 2 (fire next): memory, doc, docgraph, architecture, gpu, typegraph
Tier 3 (fire if needed): web, search, system, testgraph

Early return kicks in when confidence hits 0.75 -- if Tier 1 already found everything, Tier 2 never fires. Progressive aggregation streams results as they arrive and deduplicates on-the-fly.

The really clever bit: the orchestrator splits sources into fast and slow paths. Fast sources (memory, persona, code graphs, local state) block for up to 5 seconds total -- the LLM needs them. Slow sources (web search, architecture analysis, multi-step reasoning) fire as background tasks and inject results via callbacks into the active context cache as they complete. The LLM starts generating immediately with fast context, and slow context arrives mid-generation if it's fast enough.

3. System Events (The New Bridge)

This is what we just built. A bridge subscribes to the AitherOS event bus -- the nervous system of the OS -- and forwards relevant events to the neuron daemon.

When a service goes down? Health and cluster neurons fire automatically. When code is committed? Code, index, and codegraph neurons re-index. When a conversation happens? Memory and semantic neurons refresh. When an agent executes a tool? Related neurons fire to keep context fresh.

Here's the routing table:

Flux Event	Neurons Fired
Service down (`svc.d`)	health, cluster
Code change	code, index, codegraph
Git commit	code, index, codegraph
Conversation exchange	conversation, memory, semantic
GPU model load	gpu
Mesh node join	mesh, cluster
Agent writes a file	code, index
Agent searches web	web
Agent stores memory	memory, semantic

Every fire is throttled (5s minimum per neuron type) and telemetry is pushed to Strata every 60 seconds for analysis.

The Neuron Daemon

Behind all three trigger paths sits the neuron daemon -- a background process that runs scheduled and event-driven neuron fires. It has four firing modes:

TIMER -- Health every 30s, cluster every 60s, index every 5m, architecture every 10m
EVENT -- Conversation messages, memory stores, service state changes
PREEMPTIVE -- Based on intent prediction analysis of conversation trajectory
ON-DEMAND -- Explicit fire via MCP tools

The daemon maintains a cache hierarchy with per-source TTLs:

Source	TTL
Axioms, will, persona	1 hour
Spirit, memory, code, codegraph	5 minutes
Web, conversation	2 minutes
Affect (emotional state)	30 seconds
Health, flux	15 seconds

"Hot topics" — things the user keeps asking about — get faster refresh intervals. If you've mentioned "authentication" three times in the last five minutes, the daemon will refresh code and memory neurons for that topic every 2 minutes instead of every 5.

How Results Flow Back

Neuron results don't just float in a void. They follow a structured path into the agent's context window:

Neuron fires and returns content, sources, and a confidence score
Context orchestrator wraps the result with relevance, priority, and dedup metadata
Pipeline bridges add TTL, grouping, and a composite score
Active context cache stores the result -- scored, evictable, surgically managed
Rendered to system prompt -- priority-ordered, token-budgeted

The active context cache uses surgical eviction, not truncation. Each chunk has a score:

score = relevance * priority_weight * freshness_decay + access_boost

When the cache exceeds its token budget, it evicts the lowest-scored chunks first. Axioms (priority 5) are never evicted. If one chunk in a group goes, all chunks in that group go — because partial results are worse than no results.

Auto-Fire: The Subconscious

The auto-fire system is the simplest and most magical piece. It's 43 regex patterns that match against the user's query and inject real-time data before the LLM even sees the prompt.

User: "What time is it?"

Without auto-fire:
  AI thinks: "I need to call get_current_time tool..."
  AI outputs: "[TOOL_CALL: get_current_time]"
  System executes tool, returns time
  AI outputs: "The current time is 2:30 PM"

With auto-fire:
  System detects time query, injects time BEFORE LLM
  AI immediately knows: "The current time is 2:30 PM"
  No thinking needed — instant response.

This works for system status, cluster topology, git state, service health, code context, documentation structure, configuration topology, and a dozen other data types. The AI doesn't decide to look things up — it already knows because the data was injected into its context before it started generating.

What Changed Today

Before today, the neuron system was architecturally 90% complete but only 60% wired. The neuron daemon existed but only started in a separate service, not in the system brain. The event bus wasn't connected to neuron firing. Tool execution didn't trigger context refreshes.

We fixed all four gaps:

Boot integration -- The neuron daemon now starts during system initialization, right alongside the chat engine, scheduler, and awareness loop. When the system is up, neurons are firing.
Event bus bridge -- A new bridge polls recent system events and routes them to the daemon. Service goes down? Neurons know. Code changes? Neurons re-index. All with per-type throttling to prevent spam.
Post-tool refresh -- After every successful tool execution, the bridge fires relevant neurons as fire-and-forget tasks. Write a file? Code and index neurons refresh. Search the web? Web neuron caches results. Store a memory? Semantic neurons update.
Telemetry -- Every 60 seconds, the bridge pushes hit/miss rates, per-neuron-type breakdowns, and fire counts to the analytics system. This creates the feedback loop needed to optimize which neurons to pre-fire and when.

94 tests verify all of it.

The Philosophy

The neuron system embodies a specific philosophy: the best tool call is the one that never happens.

Every time an agent pauses to think "I should search for that," you feel it. The conversation stutters. The response takes longer. The agent seems less intelligent — not because the model is worse, but because the context gathering is visible.

Neurons hide the work. They fire autonomously, in the background, in response to stimuli that predict what the agent will need. The agent doesn't search — it already knows. The agent doesn't look up system status — it was injected 15 seconds ago. The agent doesn't check if the file exists — the codegraph was refreshed when the file was written.

This is what it means to have a subconscious. Not a faster conscious process — a parallel, autonomous, always-running system that makes the conscious process feel effortless.

42 neuron types. Three trigger paths. Background daemon. Surgical cache eviction. And the user just sees: fast, accurate, contextual responses.

That's neurons.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringarchitecturecognitionagents

Neurons: How AitherOS Thinks Before You Ask

March 4, 202612 min readAitherium