Neurons That Learn: How AitherOS Evolves Pattern Detection From Every Conversation
In the previous post, we covered how AitherOS fires 33 specialized neurons to proactively gather context before the LLM generates a response. Time, system health, GPU status, active agents, weather, git state — the system detects what data you need using regex patterns and injects it into the prompt before you even finish asking.
But regex patterns have a fundamental limitation: they only catch what you anticipated. "How much RAM do I have?" matches the resource usage pattern. "Can I run a 14B model?" doesn't — even though it absolutely needs GPU and VRAM data to answer well.
Today, we're giving each neuron its own brain.
The Problem: Smart Detection Is Hard
Consider this exchange:
You: "Can I fit deepseek-r1:14b alongside the orchestrator?"
AitherOS (before): Checks model status pattern. Fires MODEL_STATUS neuron. Returns loaded models, but no VRAM data. The agent guesses.
AitherOS (after): Fires MODEL_STATUS and GPU_STATUS. Returns loaded models + current VRAM allocation. The agent can actually compute: "14B needs ~10GB, you have 16GB free — yes."
The second response is only possible because the system learned that model-fitting questions need GPU data. No one wrote a regex for that. The system observed it.
Architecture: One Base Model, 33 Adapters
We didn't train 33 separate models. That would be wasteful and slow. Instead, we use the same lightweight transformer architecture that powers our 3-tier tool selection system, extended with per-neuron LoRA adapters.
Shared Base Micro-Transformer
|
+-- LoRA adapter: time (~128 params)
+-- LoRA adapter: resource_usage (~128 params)
+-- LoRA adapter: gpu_status (~128 params)
+-- LoRA adapter: active_agents (~128 params)
+-- ... (33 total)
The base model learns the general mapping from "query shape" to "neuron relevance." Each LoRA adapter specializes for its specific neuron type -- learning the subtle signals that distinguish queries that need GPU data from queries that need CPU data.
Training cost: effectively zero. The micro-transformer is a tiny character-level model that runs on CPU in milliseconds. Training 200 steps takes under a second.
How It Works
The system has three phases: detection, consumption tracking, and training.
Phase 1: Detection (Every Query)
When a query arrives, the neuron detection system runs two detection paths in parallel:
- Pattern matching (existing regex rules, unchanged)
- Learned prediction (the trained micro-transformer, if enough data has accumulated)
The results are merged as a union, not an intersection. If pattern matching catches it, great. If the learned predictor also catches it, no harm -- it's already in the list. If the predictor catches something the patterns missed, that's the whole point.
On Day 0, the system behaves exactly as before -- regex only. No regression. The learned predictor kicks in after it accumulates enough training data.
Phase 2: Consumption Tracking (After Every Response)
This is the clever part. After the LLM generates a response, we analyze which injected data the model actually used.
The consumption analyzer iterates over every piece of neuron data that was injected and extracts key facts from it. Then it checks whether those facts appear in the response text. Fact extraction is type-aware. For GPU data like {"name": "RTX 4090", "vram_total_mb": 24576}, the facts are ["rtx 4090", "4090", "24576"]. If any of those strings appear in the response, the neuron data was consumed.
This runs as a fire-and-forget background task — zero latency impact on the response path.
Each consumption event gets written to a JSONL file:
{"query": "can i fit deepseek alongside orchestrator", "neuron": "gpu_status", "consumed": true, "ts": 1741171200}
{"query": "tell me a joke", "neuron": "gpu_status", "consumed": false, "ts": 1741171260}
Phase 3: Training (Every 30 Minutes)
A background task checks if enough data has accumulated:
- Base model: Retrains after 50+ positive (consumed=true) records
- Per-neuron LoRA: Retrains after 20+ positive records for that specific neuron
- No blocking: Training runs as an async background task
The training data pairs each query with the neuron name it triggered. The micro-transformer evaluates each pair and produces a loss value. Low loss means the model considers this a natural query-neuron combination.
After training, the prediction loop activates each neuron's LoRA adapter in turn, evaluates the query against it, and returns the top-scoring neurons -- the ones the model is most confident the query will actually need.
The Feedback Loop
This creates a self-improving cycle:
- User asks a question
- Pattern matching + learned predictor decide which neurons to fire
- Neuron data gets injected into the context
- LLM generates a response
- Consumption analysis checks what data was actually used
- Results feed back into predictor training data
- Next training cycle improves predictions
- Go to 1
The system gets better at predicting what data you need from observing what data the LLM actually references. It's not learning from human labels or curated datasets — it's learning from its own production behavior.
What This Fixes
Before: The neuron telemetry had an effectiveness metric that was permanently stuck at 0.0. The system knew which neurons fired, but never tracked whether the data was consumed. Wasted tokens were invisible.
After: Every neuron has a real-time effectiveness score. If a neuron fires frequently but the LLM rarely uses its data, that signal feeds back into the predictor to suppress it. If a neuron rarely fires but gets consumed every time it does, the predictor learns to fire it more aggressively.
Numbers
- 84 new tests (prediction, consumption analysis, integration, persistence)
- 33 per-neuron LoRA adapters (one per neuron type)
- ~128 parameters per adapter (vs ~800 for the full base model)
- 0ms latency impact on the response path (fire-and-forget consumption analysis)
- 50 record threshold for base model training
- 20 record threshold for per-neuron adapter training
- 30 minute retrain interval
What's Next
The same consumption-driven training approach can be applied to other pre-selection systems. The tool selection system already uses a micro-transformer for tool prediction -- adding consumption feedback would let it learn which tools actually produce useful results, not just which tools get selected.
The broader principle: every time an AI system gathers context, the response tells you whether that context was valuable. Don't waste that signal.
The per-neuron adaptive training system is available in AitherOS as of today. Training begins automatically after 50 conversations -- no configuration needed.