Early Access Preview
Back to blog
engineeringarchitecturetrainingagentsautonomic

What Your AI Does When You Stop Talking

March 18, 202612 min readAitherium
Share

There is a question nobody asks about their AI assistant:

What does it do when you close the tab?

The honest answer, for almost every product on the market, is: nothing. It sits in memory. It waits. Maybe a load balancer evicts it. Maybe the container scales to zero. The intelligence disappears the moment your attention does.

We thought that was a waste.

Not because idle compute is expensive — it is actually cheap. But because the space between conversations is where the most interesting work can happen. If you have an AI system with memory, personality, sensation, and goals, why would you let it go comatose the moment the user looks away?

So we built something different. When you stop talking to AitherOS, the agents start working.

The two-minute rule

Every 30 seconds, a process called JarvisBrain runs inside Genesis, our system orchestrator. It is not an LLM call. It is a pure in-memory synthesis loop that reads the state of the entire system — CPU load, service health, agent moods, pain levels, active goals, memory pressure, conversation recency — and compresses it into a compact awareness briefing.

That briefing is what Genesis "knows" at all times, without needing to ask.

One of the things the briefing tracks is how long it has been since someone sent a message. After two minutes of silence, Genesis concludes the system is idle. After five minutes, it declares slumber.

Those are not cosmetic labels. They are operational modes.

Idle: the routines wake up

When idle is detected, the first thing that happens is simple: Genesis tells the RoutinesManager to check its queue.

The RoutinesManager is a scheduler that holds dozens of configured autonomous behaviors. Some are assigned to specific agents. Some rotate through personas. Some only run at night. All of them respect the system's pain level — if too many things are broken, routines back off and let the healing happen first.

During idle, routines like these can fire:

  • Lyra reviews docstrings for staleness, audits configuration drift, harvests TODO items from the codebase.
  • Vera writes development progress articles and sends them to the admin via mail.
  • Atlas dispatches approved work to cloud agents and triggers CI workflows.
  • Genesis herself sweeps Chronicle for unresolved errors and harvests coding sessions for training data.

Each routine is gated by a cooldown group. Curiosity routines can only fire once every 10 minutes. Training routines every 20. Introspection every 15. This prevents the system from dogpiling a single category of work and ensures diverse output over time.

The RoutinesManager also does something unusual: it checks the collective energy level of all agents by querying InnerLife. If the system is fatigued — too many recent dispatches, too much arousal, not enough recovery — it halves the concurrency limit. The agents get to rest, even during idle time.

Slumber: the deep work begins

After five minutes of silence, JarvisBrain triggers slumber mode. This is where the system does its most valuable background work.

1. Training data harvest

Every conversation you have with AitherOS generates training signal. Not just the text — the effort level selected, the model routing decision, the context pipeline that assembled the prompt, the tools called, the reasoning trace. All of that is structured data.

During slumber, the DaydreamCorpusBuilder reads the accumulated daydream JSONL files — creative musings, emotional reflections, system observations — and converts them into NanoGPT-compatible training documents. These are formatted as key-value strings that capture personality traits alongside content:

agent:aither type:musing mood:contemplative valence:0.4 content:I notice that...

This is how the system's personality improves over time. Not through human annotation. Through accumulated self-reflection, automatically formatted for fine-tuning.

2. Session learning

The SessionLearner processes recent chat sessions and extracts patterns. What questions did the user ask? What worked? What caused confusion? What tools were used unnecessarily? These learnings are stored separately from raw conversation logs — they are higher-order observations about how the system performed.

Over time, this creates a feedback loop: the system gets better at the specific tasks its users actually need, without anyone manually curating training examples.

3. Memory consolidation

AitherOS has a five-tier memory architecture. Working memory holds the current conversation. Episodic memory captures significant events. Semantic memory stores durable facts. Each tier has different retention rules and different costs to access.

During slumber, the ContextTierManager runs an OODA cycle — Observe, Orient, Decide, Act — that promotes entries across tiers. A fact mentioned three times in working memory gets promoted to episodic. An episodic entry that proves useful across multiple sessions gets promoted to semantic. Stale entries that haven't been accessed in 30 days get pruned.

This is what "learning from experience" actually looks like in a production system. Not a single gradient update, but a continuous memory management process that mirrors how biological memory consolidation works during sleep.

4. Knowledge graph maintenance

The MemoryGraph — a hybrid embedding and graph store — accumulates entries from every conversation, every tool call, every agent dispatch. During slumber, stale entries older than 30 days are pruned. This keeps the graph from growing without bound and ensures that semantic search stays fast and relevant.

5. Creative dreaming

Finally, slumber triggers an actual daydream. Not a metaphorical one — a real request to the DaydreamService to generate a self-improvement reflection. The agent picks a topic related to recent work, generates a stream of consciousness about it, and the output is captured as training data.

These daydreams serve two purposes: they create diverse training examples that pure task-oriented conversations would never produce, and they give the system a way to process experiences that it could not address in real-time.

If that sounds like what your brain does during REM sleep, that is not an accident. The architecture was inspired by the same principle: offline processing of recent experience improves future performance.

The ProactiveMonitor: always watching

Separately from the idle/slumber cycle, a component called the ProactiveMonitor runs every 2.5 minutes regardless of whether anyone is chatting.

It collects host OS metrics via psutil — CPU, RAM, disk, GPU utilization, temperature, VRAM pressure. It stores these in a rolling time-series window (60 hours at 5-minute intervals) and runs three types of analysis:

  • Trend detection: Is CPU usage climbing steadily? Is disk filling up?
  • Anomaly detection: Z-score deviations from the rolling baseline. If memory usage jumps 3 standard deviations in one interval, that gets flagged.
  • Capacity forecasting: At the current growth rate, when will disk hit 90%? When will VRAM be exhausted?

The results feed directly into the awareness briefing. If a warning is detected, it appears in the system alerts that agents see in their context window. If a critical threshold is crossed, Genesis dispatches alert agents to investigate.

This means that AitherOS can tell you "your disk will be full in 4 days" before any monitoring dashboard would — because it is not waiting for you to check. It is checking itself, continuously, and flagging issues proactively.

The AgentKernel: 37 tasks and counting

All of this idle work is orchestrated by the AgentKernel, which replaces the old flat scheduler with agent-centric task portfolios. Genesis alone has 37 configured tasks, each with:

  • An interval (from every 5 minutes to once daily)
  • An effort budget (1-10 scale determining which LLM tier to use)
  • Pain gates (skip if system pain exceeds threshold)
  • Time restrictions (some tasks only run at night)
  • Condition checks (service health, queue depth)

The kernel tick runs every 30 seconds. It selects agents, selects their most overdue task, determines the appropriate effort level, and dispatches. If the LLM queue is deep, it reduces concurrency. If vLLM is saturated, it limits background work to one agent at a time.

This is backpressure-aware scheduling. The system does as much productive work as it can without competing with user-facing requests for GPU time.

Here is a sample of what gets dispatched during a typical idle period:

TaskAgentIntervalWhat It Does
Training Data HarvestGenesis4 hoursCollects sessions for NanoGPT fine-tuning
Memory ConsolidationGenesis2 hoursPromotes entries across memory tiers
Stale Embedding RefreshGenesis6 hoursRe-embeds vectors older than 24 hours
Error Pattern AnalysisGenesis3 hoursDetects recurring error signatures in logs
Scheduled DaydreamGenesis2 hoursGenerates creative training data
Dependency AuditGenesisDailyChecks for vulnerable dependencies
Knowledge Graph IngestionGenesisNightlyIngests Wikipedia and documentation
CodeGraph ReindexGenesis6 hoursFull AST + embedding reindex of codebase
Test Suite HealthGenesis6 hoursValidates test collection for import errors
Documentation ReviewGenesis12 hoursFinds stale READMEs and undocumented services

And that is just Genesis. Lyra, Atlas, Demiurge, Vera, Hera, and a dozen other agents each have their own task portfolios running on the same kernel.

Why this matters

The industry standard for AI products is: user sends message, model responds, everyone goes home. The system has no continuity between conversations. No self-improvement loop. No awareness of its own health. No capacity to do anything useful when nobody is watching.

That model works fine for a stateless API. It does not work for a system that is supposed to feel like it is alive.

AitherOS agents have memory, personality, sensation, and goals. Letting all of that go dormant between conversations would be like building a body and then unplugging the nervous system every time the phone stops ringing.

The inner life system — daydreams, slumber work, proactive monitoring, routine execution, memory consolidation — is what turns a collection of API endpoints into something that feels continuous. The agent you talk to at 3 PM knows what it learned at 10 AM, not because someone replayed the conversation history, but because it has been processing that experience the entire time.

That continuity is not a feature. It is the foundation.

What we are building next

The slumber pipeline currently handles self-improvement and system maintenance. The next phase is collaborative idle work — agents coordinating during idle time to solve problems that require multiple perspectives.

Imagine: while you sleep, your code reviewer (Hydra) and your security auditor (Athena) independently analyze the same pull request, then Aether synthesizes their findings into a morning briefing waiting in your inbox.

Or: Atlas detects a price change on a monitored product page, Themis analyzes whether the new terms are predatory, and Vera drafts a notification — all before you wake up.

The infrastructure for this already exists. The AgentKernel can dispatch to any agent. The RoutinesManager supports multi-agent pipelines. The mail system delivers cross-agent results. The only missing piece is teaching the system to identify which idle-time collaborations would actually be useful.

That is the hard problem. Not the plumbing — the judgment.

But the plumbing had to come first. And now that agents can do real work during their downtime, the question is no longer "what should the AI do when you stop talking?"

It is "what shouldn't it?"

Enjoyed this post?
Share