Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Monitoring services…

•Connecting to services…

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringarchitecturecognitionagents

The Awareness Loop: How AitherOS Stays Aware Between Conversations

Name: AitherOS
Author: Aitherium

March 6, 202611 min readAitherium

There's a fundamental problem with every AI assistant you've ever used: they have no idea what's going on.

You open a chat. You ask a question. The model generates a response based on your message and whatever context the system scrambles to gather. Then you close the tab. The assistant stops thinking. It stops observing. It stops being aware. When you come back, it's a blank slate. You have to re-explain the situation. You have to re-establish context. You have to catch the AI up on everything that happened while you were gone.

AitherOS doesn't work like this. It has a continuous awareness loop running inside the system brain that synthesizes every signal into a compact briefing, refreshed every 30 seconds, injected into every single conversation. The agent doesn't need to be told what's happening. It already knows.

Here's how we built it.

The Problem: Context Is Expensive, Stale, and Fragmented

AitherOS runs dozens of microservices across 10 architecture layers. At any given moment, there are active goals being tracked, routines executing autonomously, agents investigating alerts, GPU workloads churning, pain signals firing through the nervous system, and a scheduler dispatching jobs. All of this is the state of the system — and all of it is relevant when the user asks a question.

The naive approach is to gather this context on every chat request. Query the orchestrator. Check the scheduler. Poll the GPU. Read memory. Fetch goals. That's 6-8 async calls before the agent can even start thinking about your message. On a good day, that's 200ms. On a bad day — when services are degraded or the event bus is backed up — it's seconds.

The insight is simple: most of this context changes slowly. System health doesn't flip every second. Goals don't complete between keystrokes. The scheduler iteration count advances every 30 seconds. So why are we gathering this data synchronously on every request?

We shouldn't be.

The Design: Pre-Computed Awareness

The awareness loop runs as a singleton background process inside the system orchestrator. Every 30 seconds, it executes a tick -- a data synthesis pass that reads from 14 sources, assembles a structured awareness briefing, and pre-renders it into a compact text block.

When the chat system needs awareness context, it reads from a synchronous cache. No awaits. No HTTP. No latency. The data is just there.

tick() every 30s:
  [14 data sources] -> awareness briefing -> pre-rendered text (~400 tokens)

get briefing:
  return cached_text  // sync, 0ms

The briefing is injected into every conversation as a system-level context block that sits alongside memory recall, affect state, and neuron-gathered information. The agent reads it like a morning briefing: here's what's happening, here's what needs attention, here's what changed since you last spoke.

The 14 Data Sources

This is where the awareness loop gets interesting. It doesn't just check "is the system healthy." It synthesizes a holistic picture of the entire organism.

1. Affect & Sensation

AitherOS has an emotional model. Not a gimmick -- a real valence/arousal/confidence/openness affect state that modulates behavior. The awareness loop reads the current affect and sensation state from an in-process state store that every service pushes data into.

Affect: valence:+0.3 arousal:0.5 confidence:0.7 openness:0.5
        pain:0.1 pleasure:0.3 sensation:calm(0.2)

This tells the agent: the system is in a slightly positive, calm state with high confidence. If valence drops negative and pain spikes, the agent knows something is wrong before any alert fires.

2. Inner Life

The system maintains a model of its own consciousness -- what it's currently thinking about, where its attention is focused, what background processes are running, and its overall mood. The awareness loop surfaces this:

Inner: Thinking: optimizing vLLM batch scheduler | Focus: GPU utilization | Mood: focused

This isn't decorative. When the user asks "what are you working on?" the agent has an actual answer.

3. Scheduler & Routines

The system runs an integrated scheduler and a routines manager that handles homeostatic routines -- autonomous behaviors like health checks, memory consolidation, and OS baseline training. The awareness loop reads both directly from same-process references:

Scheduler: iter 142, 3 jobs active, 1 queued | Routines: 12/15 enabled, 2 executing

4. Genesis Orchestrator

The core health snapshot: overall system status (HEALTHY/DEGRADED/CRITICAL), maternal mood, uptime, services in pain, and active incidents. This was the original data source from v1 of the awareness loop.

System: HEALTHY | Mood: calm | Up: 4h23m | CPU:34% RAM:62%

5. System Load

CPU, RAM, and disk utilization from the system state store. Appended to the system line so the agent can modulate behavior -- "system is under heavy load, maybe don't start that 24GB model swap right now."

6. GPU & VRAM

Utilization percentage and VRAM consumption, plus LLM backend health from the proactive monitor:

GPU: 67% util, 14.2/24GB VRAM | vLLM: orchestrator:OK reasoning:DOWN

7. AitherGoals

Active goal tracking with progress percentages and risk status:

Goals: 3 active goals, 2 on track, 1 at risk

8. Agent Activities

What agents are currently doing across the system — Council deliberations, Atlas investigations, Lyra compositions, background routine agents:

Agents: 3 agents active: atlas(investigating), lyra(composing), saga(writing)

9. Remediation Engine

AitherOS has an autonomous remediation system that matches pain patterns to auto-approved fixes. The awareness loop surfaces the stats so the agent knows about auto-healing activity:

Remediation: 5 auto-approved, 4 succeeded, 1 escalated

Failed remediations get promoted to system alerts.

10. Proactive Monitor Insights

Critical insights generated by the proactive monitor -- anomaly detection, resource trend analysis, LLM backend crash-loop detection.

11. Interrupt Controller

Pending interrupts by priority level. When an interrupt is waiting for user attention, the agent knows to surface it.

12. OS Baseline / Training

AitherOS trains a behavioral baseline model from its own telemetry. The awareness loop surfaces training state and anomaly counts:

Baseline: trained (cycle 7), loss 0.231, 2 anomalies/24h

13. Memory Context

How much is loaded in working memory across the four memory layers (fast, spirit, context, active):

Memory: 4 fast, 12 spirit, 3 context

14. Time Sense & Identity

Period of day (morning, afternoon, evening) for behavioral modulation, and owner name for personalization:

[Awareness — 14:32 (afternoon)]

The Architecture Trick: One Read, Ten Sources

Here's the thing that makes this efficient: most of these data sources don't require individual HTTP calls. They all push their data into a shared in-process state store that acts as the single source of truth for all AitherOS context.

The awareness loop reads from this store once per tick, then extracts 10+ fields from the returned object. All instant. All in-process. No network, no serialization, no latency. The remaining sources (goals, proactive monitor, interrupt controller, remediation engine) are read independently -- if one fails, the others still produce a valid briefing.

This is critical: every data source is independently fault-tolerant. If the goal manager hasn't initialized yet, the briefing still has system health, affect, scheduler state, and everything else. Partial data is always better than no data.

What the Agent Actually Sees

Here's a real briefing output:

[Awareness — 14:32 (afternoon)]
System: HEALTHY | Mood: calm | Up: 4h23m | CPU:34% RAM:62%
Affect: valence:+0.3 arousal:0.5 confidence:0.7 openness:0.5 | pain:0.1 pleasure:0.3 sensation:calm(0.2)
Inner: Thinking: optimizing context pipeline | Focus: chat latency | Mood: focused | 3 bg processes
Scheduler: iter 142, 3 jobs active | Routines: 12/15 enabled, 2 executing
Goals: 3 active goals, 2 on track, 1 at risk
Agents: 2 agents active: atlas(investigating), saga(writing)
GPU: 67% util, 14.2/24GB VRAM | vLLM: orchestrator:OK reasoning:OK
Remediation: 5 auto-approved, 4 succeeded
Baseline: trained (cycle 7), loss 0.231
Memory: 4 fast, 12 spirit, 3 context
Since last chat: pain event on aither-vision; goal 'Deploy v2' now at 72%
Recent: How do I fix the build? | Show me service logs

That's ~400 tokens of pure signal. The agent absorbs this before processing a single word of the user's message. It knows the system is healthy, the mood is calm, there are 3 active goals, atlas is investigating something, a pain event happened on the vision service since the last conversation, and the user was recently asking about builds and logs.

Compare this to a typical AI assistant that starts every conversation with zero awareness. The difference isn't marginal — it's transformative.

Alert Dispatch: Awareness That Takes Action

The awareness loop doesn't just observe -- it acts. When critical alerts appear in the briefing, it dispatches investigation agents automatically:

CRITICAL alerts → dispatches prometheus (the deep investigation agent) at effort level 7
Normal alerts → dispatches atlas (the architecture analyst) at effort level 5

Dispatch is synchronous when possible -- the system waits up to 60 seconds for the dispatched agent to respond. If the agent returns a recommendation immediately, it's recorded and the alert is marked resolved.

If an alert remains unresolved after 3 ticks (~90 seconds), the awareness loop escalates: it emits a pain signal through the nervous system, which triggers the sensation pipeline and may cause remediation or user notification.

This creates a closed loop: awareness → detection → investigation → resolution → updated awareness. The system doesn't just know something is wrong. It does something about it.

The "Since Last Chat" Feature

One of the most user-visible features is the "since last chat" summary. Between conversations, system events accumulate in the awareness loop's event buffer. When the user returns, the briefing contains a summary of everything that happened while they were gone:

Since last chat: pain event on aither-vision; goal 'Deploy v2' now at 72%;
                 3 remediation auto-fixes; atlas resolved container crash alert

When the user sends their first message back, the agent can proactively mention: "While you were away, the vision service had a brief issue that was auto-remediated, and your Deploy v2 goal is now at 72%. What would you like to work on?"

The events clear the moment the user engages in a conversation — because now they're aware too.

Conversations as Memory

Every conversation that passes through the system is recorded in the awareness loop's recent conversation buffer (last 20 exchanges). The briefing includes the last 2 topics, giving the agent continuity:

Recent: How do I fix the build? | Show me service logs

These records are also persisted to AitherMemoryBus for long-term recall. The awareness loop and the memory system are connected — what happens in the present becomes retrievable in the future.

Why Not Just Use the LLM?

A reasonable question: why build all this machinery when you could just prompt the LLM with "summarize the current system state"?

Three reasons:

Latency. An LLM call takes 500ms-2s. Reading from the awareness cache takes 0ms. When you're in the chat pipeline between the user pressing enter and the agent responding, every millisecond of context gathering is latency the user feels.
Cost. Running an LLM summary every 30 seconds would cost thousands of tokens per minute, 24/7. The awareness tick is pure Python -- data reads, string formatting, no inference.
Determinism. LLM summaries are probabilistic. They might miss a critical alert. They might hallucinate a goal that doesn't exist. The awareness loop's output is deterministic -- it reads structured data and renders structured text. If the system is DEGRADED, the briefing says DEGRADED. Every time.

API Access

The awareness loop exposes endpoints for debugging and external consumption:

Briefing (JSON) -- full briefing with all 14 fields as structured data
Briefing (text) -- pre-rendered text for direct injection into agent context
Status -- diagnostic: tick count, briefing age, connected data sources, populated fields

The status endpoint is particularly useful for debugging. It shows exactly which data sources are connected and which briefing fields are populated -- making it immediately visible if a source is failing silently.

What It Feels Like

The best way to understand the awareness loop is to use AitherOS for a day.

You start a conversation in the morning. The agent greets you with awareness of the time, your name, and the system's overnight activity. You ask about a goal — it already knows the progress without querying. You mention a service issue — it's already aware of the pain event and has dispatched an investigation agent. You step away for lunch. When you come back, the agent catches you up on what happened while you were gone.

It doesn't feel like talking to a stateless API. It feels like talking to something that's been paying attention.

That's the whole point. The awareness loop isn't a feature you interact with directly. It's the substrate that makes every interaction feel informed, continuous, and alive. It's the difference between an AI that responds and an AI that knows.

The awareness loop runs inside the system orchestrator, ticks every 30 seconds, and synthesizes 14 data sources into ~400 tokens of awareness context. If you're building services that emit events through the event bus, your data automatically flows into the next tick.

Enjoyed this post?

All posts Try AitherOS