Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Monitoring services…

•Connecting to services…

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

architecturememorytenantssecurityagentsdeep-dive

How AitherOS Remembers Everything — Multi-Tier Memory Architecture & Tenant Isolation

Name: AitherOS
Author: Aitherium

March 16, 20269 min readAitherium

How AitherOS Remembers Everything

Your AI agent remembers your name. It remembers the bug you filed last Tuesday. It remembers that you prefer concise answers and hate when code examples use foo and bar. It remembers all of this across Discord, Slack, Telegram, WhatsApp, the web dashboard, your desktop app, and your browser extension.

And it remembers nothing about the tenant next door.

This post explains how.

The Problem With Forgetful Agents

Most AI chat systems treat every conversation as a blank slate. You open a new tab, start a new session, switch devices — and the AI has no idea who you are. The context window is the only memory, and it evaporates the moment the session ends.

For a toy chatbot, that's fine. For an AI operating system that manages your infrastructure, writes your code, tracks your tasks, and coordinates a fleet of specialized agents — forgetting is not an option.

AitherOS solves this with a 5-tier memory cascade that ensures nothing is ever truly lost, while a 4-level scoping hierarchy ensures nothing leaks between tenants.

The 5-Tier Memory Cascade

Think of memory like a CPU cache hierarchy. Hot data lives close to the processor. Cold data lives on disk. Nothing is deleted — it just moves to a cheaper, slower tier.

Tier 0: Active Context (the system prompt)

Every conversation starts by assembling a system prompt through a 12-stage ContextPipeline. The stages, in order:

[AXIOMS] → [IDENTITY] → [SOUL] → [RULES] → [CAPABILITIES]
→ [CONTEXT] → [MEMORIES] → [AFFECT] → [RESPONSE FORMAT]

The [MEMORIES] layer is where past knowledge gets injected. The pipeline queries Spirit (semantic memory), fires memory neurons, and pulls crystallized conversation summaries — all in parallel, all with a token budget.

When the assembled context exceeds the token budget, a ContextSpillover mechanism kicks in. It doesn't throw content away. It pushes evicted content down to Tier 1 and Tier 2, tagged with its source label and tenant scope. The LLM sees a note like [...memories: 72% kept, 340 tokens spilled to memory] so it knows context was compressed, not lost.

Tier 1: Session Cache (KernelContextBus)

In-process, instant recall. The KernelContextBus holds spillover content from the current and recent sessions. When Tier 0 needs to recall something it evicted earlier, recall_for_query() searches these session buckets with semantic matching.

This tier lives as long as the process runs. It's the L1 cache of the memory system.

Tier 2: Conversation Store (disk-backed JSON)

Every conversation is persisted to disk as a JSON file:

data/conversations/{tenant_slug}/{session_id}.json

The store has three key behaviors:

Crystallization: After 20 turns, the store triggers an LLM summarization pass. Older messages get compressed into a crystallized_summary (max 800 tokens). The 10 most recent messages are kept verbatim. This means a 200-turn conversation still fits in a 4K token context window.

Auto-archive: Conversations older than 30 days are moved to an archive directory. They're not deleted — they're still searchable via the knowledge graph.

LRU caching: Up to 200 hot sessions stay in memory for instant access. Cold sessions are loaded from disk on demand.

Tier 3: Spirit & Scoped Memory (persistent, cross-session)

This is where cross-session memory lives. Spirit is the semantic memory service — it stores facts, preferences, patterns, and observations that persist indefinitely.

Every time the UCB (Unified Chat Backend) processes a message, it calls _gather_memory_context() which hits Spirit's /recall endpoint with the user's query. Spirit returns the top 3 most relevant memories, which get injected into the [MEMORIES] layer of the system prompt.

ScopedMemory adds a 4-level hierarchy:

platform:*:*            → AitherOS platform knowledge (docs, patterns)
{tenant}:*:*            → Organization/team knowledge (shared context)
{tenant}:{user}:*       → User knowledge (preferences, history)
{tenant}:{user}:{project} → Project knowledge (code decisions, plans)

When you query at any scope, results from all ancestor scopes are included with diminishing relevance weights. A project-level query automatically includes user, tenant, and platform knowledge. This means a new team member inherits organizational context on day one.

Tier 4: Knowledge Graph & Strata (permanent, relational)

The deepest tier. Two systems work together:

KnowledgeIngester — triggered after every conversation via on_conversation_end(). The UCB fires this as an async background task:

Extracts user questions as episodic memory nodes
Ingests into the UnifiedKnowledgeLayer with domain="conversation" and ["episodic"] tags
Resolves tenant scope via TenantDataIsolation.resolve_scope()
Mirrors to the MemoryBus for cross-service availability

Strata (port 8136) — the permanent archival layer. Tenant-namespaced paths (tenant:acme/conversations/...) ensure complete isolation. Strata supports multiple backends (local filesystem, S3, MinIO) with an offline queue for resilience.

The MemoryGraph faculty graph stores entities, relationships, and embeddings in SQLite. It supports hybrid search (keyword + semantic) and auto-creates edges between related nodes.

The Recall Loop

Here's what happens when a new message arrives:

User sends "What was that API issue from last week?"
    │
    ▼
ChatEngine → UCB.gather_context() fires in parallel:
    ├── Spirit /recall → semantic search over persistent memories
    │   Returns: "API rate limiting bug in v2.3, resolved March 10"
    │
    ├── MemoryNeuron + GraphNeuron → fires against knowledge graph
    │   Returns: related conversation nodes, code changes
    │
    ├── ConversationStore.get_context_for_llm()
    │   Returns: crystallized summary + recent 10 messages
    │
    └── PartnerKnowledge → team/org context if applicable
    │
    ▼
build_system_message() assembles [MEMORIES] layer:
    "You previously discussed an API rate limiting bug in v2.3
     that was resolved on March 10 by adding retry logic..."
    │
    ▼
LLM generates response with full context
    │
    ▼
Post-response (async, non-blocking):
    ├── ConversationStore.append_message() → disk
    ├── FluxEmitter.emit(CONV_EXCHANGE) → event bus
    └── KnowledgeIngester.on_conversation_end() → graph nodes

The agent remembers because every conversation creates durable memory, and every new conversation queries that memory.

Tenant Isolation: Three Worlds That Never Touch

AitherOS serves three fundamentally different trust levels. The isolation model ensures they never leak into each other.

CallerContext: The Trust Boundary

Every inbound request carries a CallerContext that identifies who's calling:

class CallerType(str, Enum):
    PLATFORM  = "platform"    # The operator (full access)
    TENANT    = "tenant"      # Paying customer (scoped access)
    PUBLIC    = "public"      # External/demo users (restricted)
    DEMO      = "demo"        # Trial users (maps to PUBLIC)
    ANONYMOUS = "anonymous"   # No identity (maps to PUBLIC)

Each caller type carries 5 permission flags:

Permission	PLATFORM	TENANT	PUBLIC
`can_use_agentic`	Yes	Yes	No
`can_generate`	Yes	Yes	Limited
`can_mutate`	Yes	Scoped	No
`can_admin`	Yes	No	No
`can_access_internal`	Yes	No	No

How CallerContext Flows

AitherVeil (web dashboard) sends an X-Caller-Type header based on the authenticated user
Genesis builds a CallerContext from that header and sets it as a ContextVar
Every downstream service — ChatEngine, AgentForge, ActionExecutor, ConversationStore, MemoryBus — reads the ContextVar
Backward compatibility: if no CallerContext is set (internal service-to-service calls), it defaults to PLATFORM with full access

Pipeline Gates

Three critical pipeline stages enforce isolation:

ChatEngine Gate: Blocks agentic dispatch and generation for PUBLIC callers. Demo users get simple Q&A, not full agent orchestration.

AgentForge Gate: Prevents PUBLIC callers from spawning subagents or executing multi-step workflows.

ActionExecutor Gate: Blocks all mutations (file writes, deployments, config changes) for anyone below TENANT level.

Memory Isolation in Practice

Every memory tier respects tenant boundaries:

ConversationStore: Files stored in data/conversations/{tenant_slug}/. The _session_path() method auto-resolves the tenant from the ContextVar. Platform conversations go in the root directory; tenant conversations go in {tenant_slug}/ subdirectories. A tenant can never access another tenant's conversation files.

ContextSpillover: The _get_tenant_slug() method reads the current tenant before every spill operation. Spillover buckets are tenant-namespaced. Platform and external tenants never share buckets.

KnowledgeIngester: Every ingest_knowledge() call includes tenant_id. The knowledge graph indexes by tenant, and queries are always tenant-filtered.

ScopedMemory: The 4-level hierarchy (platform → tenant → user → project) means a tenant query never sees another tenant's data. The platform:*:* scope is shared read-only — it contains AitherOS documentation and patterns, not user data.

Strata: Storage keys are tenant-namespaced: tenant:acme/data/.... Path traversal protection prevents key manipulation.

The Public Tenant

External users (DEMO, ANONYMOUS) are auto-routed to a special "public" tenant with Explorer-tier quotas. They get:

Their own isolated conversation store directory
No access to platform or other tenant memories
Rate-limited generation (no agentic workflows)
No mutation capabilities
Separate knowledge graph partition

This means you can expose AitherOS to the public internet and the worst that happens is someone has a rate-limited conversation that's completely isolated from your production data.

Cross-Platform Identity: One User, Many Channels

With the v0.9.0 release, AitherOS now supports cross-platform user pairing. A user can link their Discord, Telegram, Slack, and WhatsApp identities to a single AitherOS Directory entry.

How It Works

Alice sends /pair on Discord → generates a 6-character code (10-minute TTL)
Alice sends /pair A7K2M9 on Telegram → redeems the code
Both platform identities are now linked to one Directory user
Future messages on either platform resolve to session_id: "user-{aither_user_id}"

Why It Matters for Memory

Before pairing, Alice had two separate conversation histories:

discord-123456 → knows about her Discord conversations
telegram-789 → knows about her Telegram conversations

After pairing, both resolve to user-a1b2c3d4. The ConversationStore, Spirit, KnowledgeIngester, and ScopedMemory all see the same user. Alice can start a conversation on Discord at her desk and continue it on Telegram from her phone — with full context.

The PairingManager stores linked identities in AitherDirectory with indexed attributes (aitherDiscordId, aitherTelegramId, aitherSlackId, aitherWhatsAppPhone). Lookups are O(1) via the attribute index.

Old conversations from before pairing are NOT retroactively merged — only future messages share context. This is a deliberate design choice to avoid surprising users with context they didn't expect the agent to have.

The Training Loop: Memory That Improves the Model

Memory doesn't just serve recall — it feeds learning.

DaydreamCorpus collects conversation data into JSONL training sets. NanoGPT can fine-tune character-level transformers on this data. SessionLearner extracts patterns from conversation outcomes (what worked, what didn't, what the user corrected).

The loop:

Conversations → ConversationStore → KnowledgeIngester → Graph
    → SessionLearner → DaydreamCorpus → NanoGPT training
    → Better context selection → Better conversations

All of this is tenant-scoped. A tenant's conversation data only trains models within that tenant's scope. Platform-level learning aggregates anonymized patterns only.

Summary

Tier	Storage	TTL	Tenant-Scoped	Recall Method
0: Active Context	In-memory (system prompt)	Request	Yes (ContextVar)	Direct injection
1: Session Cache	KernelContextBus	Process lifetime	Yes (bucket keys)	`recall_for_query()`
2: Conversations	JSON on disk	30 days → archive	Yes (subdirectory)	`get_context_for_llm()`
3: Spirit/Scoped	Persistent service	Indefinite (decay)	Yes (4-level scope)	`/recall` semantic search
4: Graph/Strata	SQLite + filesystem	Permanent	Yes (tenant_id index)	Hybrid keyword + semantic

The design principle is simple: data flows down, recall flows up. Hot context overflows into session cache, which overflows into disk, which crystallizes into semantic memory, which gets indexed in the knowledge graph. When a new conversation starts, the pipeline queries upward through all tiers to reconstruct the most relevant context.

Nothing is ever truly forgotten. It just moves to a quieter place until it's needed again.

AitherOS v0.9.0 is available now. The cross-platform pairing, Slack integration, WhatsApp integration, and voice capabilities ship in the aither-adk package on PyPI.

Enjoyed this post?

All posts Try AitherOS