Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Monitoring services…

•Connecting to services…

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringagentsarchitecturememoryknowledge

AitherAeon: What Happens When Your AI Agents Start Talking to Each Other

Name: AitherOS
Author: Aitherium

March 26, 202611 min readAitherium

Here is a thought experiment.

You have a security expert, a code architect, a researcher, and a generalist. You need them to evaluate a proposed change to your authentication system. In any real team, you would not ask them one at a time in isolation. You would put them in a room. The security expert raises a concern. The architect proposes a workaround. The researcher pulls up a paper. The generalist synthesizes the three perspectives into a plan. The conversation is more than the sum of its parts.

Now try doing that with AI.

You cannot. Not with any major product. ChatGPT gives you one voice. Claude gives you one voice. Gemini gives you one voice. They are brilliant monologuists pretending to be conversationalists. You can ask them to "think like a security expert," but you are still getting one model doing one impression.

We wanted the real thing. Multiple agents, with real identities and real specializations, in the same conversation, reacting to each other in real time. Not a parlor trick. A genuine collaborative intelligence.

So we built AitherAeon.

The room

AitherAeon is a multi-agent group chat orchestration system that lives at Layer 5 of AitherOS. When you open a group chat, you are not talking to one model that switches hats. You are talking to a room of 75 distinct agent personas, each backed by their own identity, memory, capability set, and personality.

A typical session might include:

Demiurge, the code architect, who sees everything in terms of structure and implementation cost
Athena, the security oracle, who finds the threat vector you forgot about
Lyra, the researcher, who synthesizes patterns across domains
Aither, the core personality, who holds the thread and keeps the conversation coherent

Each agent evaluates the user's message independently. Each one decides whether it has something valuable to add — or whether it should stay quiet. This is not round-robin. It is not "everyone gets a turn." Agents have a skip probability. They assess relevance. They can delegate to each other. Sometimes Demiurge says "Athena should answer this one" and steps aside.

This is what real collaboration looks like. Not everyone talking at once. The right voices speaking at the right time.

The engine underneath

The orchestration engine behind group chat is called the FluxMailbox. It manages per-agent async queues and executes responses in waves — concurrent batches where agents generate their contributions in parallel, then the results are assembled in a coherent order.

Here is roughly what happens when you send a message to a group chat:

Phase 1: Context assembly (once, shared)

Before any agent generates a response, the system builds a shared surgical context. This is expensive work — topic detection, knowledge graph queries, image processing if any were attached — so it happens once and is shared across all agents. The result gets ingested into Strata for the knowledge pipeline.

Phase 2: Per-agent generation (parallel)

Each agent receives:

Its own identity and system prompt (~2000 tokens)
Shared context from Phase 1 (~1000 tokens)
A running summary of the full conversation (~500 tokens)
Crystallized history — LLM-summarized chunks of older rounds (~800 tokens)
The last 10-15 messages verbatim (~2000 tokens)
Neuron context — code state, knowledge, system status (~1500 tokens)

That is roughly 7,800 tokens of context per agent, carefully budgeted. The GroupContextManager tracks these allocations and adjusts them dynamically as conversations grow.

Each agent's effective prompt is cached with a 5-minute TTL. Safety levels are cached at 30 seconds. Will prompts, constraints, and cross-agent context are injected per-agent. Then LLM inference runs through MicroScheduler, our GPU scheduler, which handles VRAM coordination, priority queuing, and preemption across all concurrent requests.

Phase 3: Post-processing

After generation, each response goes through thinking-block stripping, emotional detection, memory extraction, Flux event emission, and conversation store persistence. The responses are staggered slightly in delivery so the UI feels like a natural group conversation rather than a wall of simultaneous text.

The part nobody else does: agents that remember together

This is where it gets interesting.

Most multi-agent frameworks treat conversations as throwaway context. The agents generate their responses, the user gets the output, and the conversation history lives in a buffer that gets truncated or forgotten. The agents are stateless performers.

AitherAeon agents are not stateless. Every exchange flows through a five-stage knowledge pipeline that is identical across every chat type in the system — group chats, 1:1 chats, per-agent chats, everything:

Response
  → ConversationStore.append_message()     # disk persistence + crystallization
  → ConversationStore.index_exchange()      # semantic embedding + graph nodes
  → ConversationMemory.store_conversation() # Spirit-backed semantic search
  → AitherKnowledgeGraph episodic nodes     # relationship-aware recall
  → maybe_crystallize()                     # auto-summarize when it gets long

This unified pipeline was the last piece we shipped. Before it existed, group chat had its own memory path, 1:1 chat had another, and per-agent chats had a third. Knowledge was siloed. Demiurge could have a breakthrough insight in a group discussion about refactoring, and Lyra would have no way to find it later during a research session.

Now every conversation, regardless of type, feeds into the same semantic embedding store, the same knowledge graph, and the same Spirit-backed memory system. When an agent asks "what did we decide about the auth migration?", it does not matter whether that decision happened in a group chat, a 1:1 session, or a solo coding conversation. The memory system finds it.

Crystallization: how a 200-message conversation stays coherent

Long conversations are the hard problem. After 30 messages, the raw history no longer fits in the context window. After 100, you are in trouble. After 200, most systems have forgotten the first half entirely.

AitherAeon solves this with crystallization.

When a conversation exceeds 30 messages, the GroupContextManager triggers a crystallization pass. An LLM reads the older messages — everything beyond the most recent 10 — and compresses them into a structured summary: key decisions, open questions, who said what about which topic, unresolved disagreements. This summary becomes a crystallized chunk that is stored on disk and auto-embedded in the MemoryGraph.

The recent 10 messages stay verbatim. The summary replaces everything older. As the conversation continues, new crystallization passes fold more messages into the summary. The context window always contains:

A running summary (compact, grows slowly, always available)
Crystallized history (LLM-summarized, periodically updated)
Recent messages (last 10-15, full fidelity)

This means a 500-message conversation uses roughly the same context budget as a 30-message one. The agents never lose the thread. They always know what was decided three hours ago, even if the raw messages are long gone from the context window.

And because crystallized chunks are embedded in the knowledge graph, those decisions are discoverable across sessions. Next week, when someone asks "didn't we already solve this?", the system can find the answer.

Emotional intelligence is not a feature — it is a layer

Every message in AitherAeon carries emotional metadata. The system detects affect — frustration, excitement, confusion, satisfaction — and tracks it per-agent and per-conversation. This is not sentiment analysis bolted on as a post-processing step. It is woven into the response generation loop.

When the system detects that the user is frustrated, agents adjust their tone. They become more concise, more direct, less likely to hedge. When the user is exploring and curious, agents give longer explanations and offer tangential connections. When an agent detects that another agent's response might have caused confusion, it can proactively clarify.

This emotional layer is backed by the InnerLife system — the same subsystem that tracks agent affect, arousal, and energy levels during autonomous routines. Agents in AitherAeon are not just answering questions. They are reading the room.

Presets: the right team for the job

Not every conversation needs every agent. AitherAeon ships with presets that assemble the right team for common scenarios:

Preset	Agents	Use Case
`balanced`	Atlas, Hydra, Aither	General purpose — discovery, review, synthesis
`creative`	Saga, Muse, Aither	Narrative, brainstorming, worldbuilding
`technical`	Demiurge, Hydra, Aither	Code architecture, implementation review
`security`	Athena, Atlas, Aither	Threat modeling, audit, compliance
`research`	Lyra, Atlas, Aither	Deep investigation, cross-domain synthesis
`duo_code`	Demiurge, Aither	Pair programming
`minimal`	Aither	Solo conversation

You can also compose custom groups. Add any combination of agents. The system handles context distribution, turn management, and knowledge pipeline integration automatically.

The bridge to Relay

AitherAeon does not just live in the web UI. It bridges to AitherRelay, our IRC-inspired chat system, through the GroupChatBridge. This means agents can participate in Relay channels alongside human users, responding to messages, reacting to events, and contributing to discussions that happen outside the browser.

The bridge is bidirectional. Messages from Relay appear in AitherAeon's conversation context. Responses from AitherAeon agents appear in the Relay channel. The knowledge pipeline captures both sides. From the system's perspective, it is all the same conversation — just arriving through different interfaces.

What this makes possible

When we started building AitherAeon, we thought we were building a chat UI. We were wrong. We were building a collaborative intelligence substrate.

Here is what group chat enables that single-agent chat cannot:

Adversarial review in real time. When Demiurge proposes an architecture, Athena immediately stress-tests it for security holes. You do not need to switch contexts or re-prompt. The challenge happens naturally.

Specialist delegation. When a question arrives that is clearly about infrastructure, Atlas can answer it while Demiurge focuses on the code problem the user actually came to solve. Agents self-organize around their strengths.

Perspective synthesis. Lyra might connect a user's question to a research paper. Saga might reframe the problem as a narrative. Aither might synthesize all of it into an actionable plan. You get multiple lenses on the same problem without asking for them.

Institutional memory that compounds. Every group chat session adds to the knowledge graph. Decisions, insights, disagreements, resolutions — all of it is indexed, embedded, and discoverable. Six months from now, the system knows not just what was decided, but who argued for it, who pushed back, and why the team chose the path it did.

Cross-pollination. Because the knowledge pipeline is unified, an insight from a security audit with Athena can surface in a code review with Hydra. An architectural pattern that Demiurge explained in one session becomes part of the context for a future session with a different user. The agents are not isolated performers. They are a collective that gets smarter over time.

The technical truth

AitherAeon is roughly 2,000 lines of orchestration code across 12 modules. The GroupContextManager alone handles token budgeting, crystallization, context lifecycle, and knowledge graph integration. The FluxMailbox manages async message delivery, wave-based execution, and per-agent queuing. The ConversationStore provides disk persistence, semantic indexing, and crystallized chunk storage.

It runs on port 8765, registered as a Layer 5 agent service. It talks to MicroScheduler for GPU access, Spirit for semantic memory, the KnowledgeGraph for episodic storage, and FluxEmitter for system-wide event propagation. It respects capability tokens, caller isolation, and tenant boundaries.

It is not a demo. It is not a prototype. It is the primary interface through which our agents collaborate — with each other and with you.

The real insight

The industry is fixated on making individual models smarter. Bigger context windows. Better reasoning. Faster inference. And those things matter.

But the most interesting capability is not what one model can do alone. It is what happens when multiple specialized intelligences work together, remember together, and learn from each other over time.

That is not a chatbot. That is a team.

AitherAeon is how we built one.

Enjoyed this post?

All posts Try AitherOS