Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Connecting to services…

•

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringsecurityarchitecturemulti-tenantencryption

Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.

March 25, 202614 min readAitherium

Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.

Published by Aitherium — March 25, 2026

At 7:34 PM tonight, someone opened our public demo at demo.aitherium.com and typed:

Generate an image of a cyberpunk cityscape at sunset

The intent classifier nailed it in 3 milliseconds. Category: vision. Chain: iris → creative_engine. Effort: 3. The system knew exactly what to do — route the request through the Canvas fast-path to ComfyUI, generate an SDXL image in about 15 seconds, and return a base64 blob.

Then the security gate killed it.

[SECURITY] Blocked image generation for external caller

The user got a clear message explaining that image generation is a platform-only feature. No timeout, no "connection lost", no cryptic error. A sub-100ms response explaining exactly what happened and why.

This is the story of how that security gate works, and the five layers of isolation that sit beneath it.

The Problem: One Machine, Many Callers

AitherOS runs on a single workstation. An RTX 5090 with 32GB of VRAM, 128GB of RAM, a Ryzen 9 9950X3D. It's a serious machine, but it's still one machine. And it's exposed to the internet through a Cloudflare tunnel.

That means two very different classes of users hit the same hardware:

Me — the platform owner, running locally, full access to everything
Everyone else — demo users, potential customers, curious visitors, bots

If both classes get the same permissions, a single viral tweet about the demo could burn through my GPU allocation in hours. Someone could queue up 500 image generation requests and lock ComfyUI for the rest of the day. Or worse — someone could trigger agentic workflows that spawn subagents, execute code in sandboxes, or write files to disk.

The traditional answer is "put it behind a login." But we wanted the demo to be frictionless. No sign-up. No API key. Just talk to the AI and see what it can do. That means the security has to be invisible — present for every request, but only blocking the things that actually matter.

Layer 1: CallerIsolation

Every request that enters Genesis (our system orchestrator on port 8001) goes through caller classification before anything else happens. This is not middleware you can skip. It's baked into the request processing pipeline.

The CallerType Hierarchy

class CallerType(str, Enum):
    PLATFORM  = "platform"    # Local operator — full access
    TENANT    = "tenant"      # SaaS customer — plan-gated
    DEMO      = "demo"        # Demo key holder — limited
    PUBLIC    = "public"      # External 3rd-party — restricted
    ANONYMOUS = "anonymous"   # No identity — most restricted

Five levels. Each one maps to a permission matrix:

_CALLER_PERMISSIONS = {
    CallerType.PLATFORM: {
        "can_agentic": True,  "can_forge": True,
        "can_mutate": True,   "can_execute": True,
        "can_generate": True, "can_multi_agent": True,
    },
    CallerType.PUBLIC: {
        "can_agentic": True,  "can_forge": False,
        "can_mutate": False,  "can_execute": False,
        "can_generate": False,"can_multi_agent": False,
    },
    CallerType.ANONYMOUS: {
        "can_agentic": False, "can_forge": False,
        "can_mutate": False,  "can_execute": False,
        "can_generate": False,"can_multi_agent": False,
    },
}

PUBLIC callers can chat. They get the full context pipeline, the orchestrator model, even tool-augmented responses. But they can't generate images, spawn subagents, execute code, create documents, or trigger multi-agent workflows. ANONYMOUS callers can't even enter the agentic loop.

Fail-Closed Classification

The critical design decision: non-local requests can never be classified as PLATFORM.

[CALLER] Non-local request resolved to PLATFORM — downgrading to ANONYMOUS

That log line is the fail-safe. If a request arrives from an IP outside RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) but somehow has PLATFORM-level headers, it gets forcibly downgraded. You can't spoof your way to platform access from the public internet.

The Veil frontend sends an X-Caller-Type header with each request. Genesis reads it, validates the source IP, and builds a CallerContext object that propagates through every service call via Python's contextvars. Every downstream service — ChatEngine, AgentForge, ActionExecutor, Strata — can read the caller context without it being passed as a parameter. It's ambient. And it's immutable once set.

What This Looks Like in Practice

When the cyberpunk image request arrived:

Veil forwarded it through the Cloudflare tunnel (external IP)
Genesis detected non-local origin → downgraded to ANONYMOUS
IntentClassifier correctly identified vision intent (3ms)
ChatEngine checked caller.can_generate → False
Security gate returned an immediate, clear response

The image generation pipeline was never invoked. Canvas never saw the request. ComfyUI never loaded a model. The GPU never context-switched. The entire rejection happened in the orchestrator's Python process, in under 100 milliseconds.

Layer 2: Multi-Tenant Isolation

CallerIsolation handles the "who are you" question. Multi-tenancy handles "where does your data go."

Per-Tenant Everything

Every tenant in AitherOS gets:

Resource	Isolation Method
Database	Separate PostgreSQL database (`aither_{slug}`)
Storage	Strata prefix (`tenants/{slug}/`)
Events	FluxBus namespace (`{slug}:`)
Secrets	AitherSecrets namespace (`tenants/{slug}`)
Context	ContextVar propagation (no cross-tenant leakage)

Public demo users are automatically routed to a PUBLIC tenant. Their conversations, their context, their generated artifacts — everything lives in a sandboxed namespace. If a tenant requests aither://warm/outputs/image.png, Strata silently rewrites it to aither://warm/tenants/{slug}/outputs/image.png. The tenant never sees the rewrite.

Plan-Gated Capabilities

Tenants get different permission levels based on their subscription plan:

PlanTier.EXPLORER:      effort_cap=6    # Orchestrator model only
PlanTier.BUILDER:       effort_cap=6    # Same model, more context
PlanTier.STARTER:       effort_cap=6    # $49/mo — basic agentic
PlanTier.GROWTH:        effort_cap=8    # $99/mo — reasoning unlocked
PlanTier.PROFESSIONAL:  effort_cap=10   # $199/mo — full reasoning + agentic
PlanTier.ENTERPRISE:    effort_cap=-1   # Custom — uncapped

The effort cap controls which models the tenant can access. Efforts 1-6 route to the orchestrator (fast, cheap). Efforts 7-8 unlock the reasoning model (slower, smarter). Efforts 9-10 get the full reasoning pipeline with graph context, RLM workspace, and multi-agent coordination.

Image generation (can_generate) only unlocks at GROWTH tier and above. Document generation follows the same gate. These aren't arbitrary restrictions — they map directly to GPU cost. A single SDXL image generation takes 15 seconds of exclusive GPU time. At scale, that's the most expensive operation in the system.

Layer 3: Strata — Tiered Storage with Tenant Walls

Strata is our unified filesystem service. Every piece of data in AitherOS — models, images, training data, conversation logs, agent artifacts — flows through Strata. It runs on port 8136 and presents a virtual filesystem with four tiers:

HOT   (NVMe, <100ms)  — Active models, embeddings, tokenizers
WARM  (SSD, <50ms)    — Outputs, renders, training data, workspaces
COLD  (NAS, <500ms)   — Archives, backups, historical data
CACHE (Redis, <5ms)   — Ephemeral temp files, downloads

Why Strata Matters for Security

Every service in AitherOS writes through Strata. When Canvas generates an image, it goes to Strata. When a conversation is stored, it goes to Strata. When an agent creates an artifact, it goes to Strata.

This means Strata is the single enforcement point for storage isolation. If tenant A's data accidentally ends up in tenant B's namespace, that's a Strata bug — and only a Strata bug. No other service can write to raw disk paths. The virtual filesystem is the only interface.

Strata also feeds the training pipeline. IDE sessions from Claude Code, Cursor, Copilot, and Gemini are ingested through /api/v1/ingest/ide-session. Conversation exchanges flow through FluxEmitter events into the KnowledgeIngester, which indexes them in faculty graphs. All of this respects tenant boundaries. A tenant's conversations never leak into another tenant's knowledge graph.

Automatic Tier Migration

Data moves between tiers based on access patterns:

WARM files untouched for 30 days migrate to COLD (with zstd compression)
CACHE files expire after 24 hours
HOT tier has a 100GB cap — when full, least-recently-used models get evicted to WARM

This isn't just about performance. COLD tier data on NAS storage is physically separate from the NVMe pool that serves active requests. An attacker who somehow compromises the hot path doesn't automatically get access to archived data.

Layer 4: AitherLockbox — Five-Layer Encrypted Vault

Lockbox is where the truly sensitive stuff lives. Agent personas, system prompts (we call them "wills"), authorization policies, cryptographic keys, and static configuration that should never be readable without explicit unlocking.

The Five Layers

Layer 1: Hidden Location. The lockbox directory isn't called "lockbox" or "secrets." It's a SHA-256 hash of the hardware ID, buried in a .cache/ directory. No symlinks point to it. It's in .gitignore. You'd have to know what you're looking for to find it.

Layer 2: Encryption at Rest. Every file inside the lockbox is encrypted with Fernet (AES-128-CBC + HMAC-SHA256). The encryption key is derived from three inputs:

kdf = PBKDF2HMAC(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,          # 32 random bytes, unique per lockbox
    iterations=600000,  # OWASP 2023 recommendation
)
key = base64.urlsafe_b64encode(
    kdf.derive(passphrase + hardware_id)
)

600,000 iterations of PBKDF2. The salt is 32 bytes of cryptographic randomness. The key material includes the hardware ID, so even if someone steals the encrypted files and knows the passphrase, they can't decrypt them on a different machine.

Layer 3: Integrity Verification. A SHA-256 manifest tracks every file in the lockbox. The manifest itself is encrypted. On every access, the lockbox verifies the manifest hash against the stored value. If a single byte has been modified, tampered, or corrupted — the lockbox detects it and logs a TAMPER_DETECTED audit event.

Layer 4: Access Control. The lockbox requires a passphrase to unlock. After unlocking, it auto-locks after 30 minutes of inactivity. Five failed unlock attempts trigger a 15-minute lockout. Every operation — unlock, lock, store, retrieve, delete — is written to a monthly audit log.

Layer 5: Memory Protection. Decrypted data is held in memory only. It's never written to temp files. When the lockbox locks, the in-memory cache is cleared immediately. The decrypted content exists for exactly as long as it's needed, and not one second longer.

Hardware-Bound Key Recovery

Docker containers get rebuilt. Hardware IDs change. This creates a key management problem: how do you decrypt data when the key material has changed?

Lockbox uses a recovery cascade:

Try the stable hardware ID (persisted to the data volume, survives rebuilds)
Try the AITHER_MASTER_KEY-derived ID (env var, set once)
Try the volatile hardware ID (platform-specific, last resort)
If all fail: wipe stale data rather than leaving unrecoverable encrypted blobs

The stable hardware ID is the critical piece. It's written to a file on the Docker data volume on first boot and persists across container rebuilds. As long as the data volume survives, the lockbox survives.

Layer 5: AitherSecrets — The Credential Vault

AitherSecrets (port 8111) is the centralized credential store. Every API key, OAuth token, database password, and service certificate in the system lives here. Not in environment variables. Not in .env files. Not in Docker secrets. In the vault.

Service Identity

When a service boots, AitherSecrets issues it an Ed25519 keypair. The private key is stored encrypted in the vault. The public key is distributed for verification. Every inter-service HTTP request is signed with the caller's Ed25519 private key and verified by the recipient's middleware.

This means service impersonation is cryptographically impossible. If an attacker compromises one container and tries to make requests as a different service, the signature check fails. The receiving service rejects the request before it hits the application layer.

Secret Types and Access Levels

class AccessLevel(str, Enum):
    PUBLIC      = "public"       # Any service can read
    INTERNAL    = "internal"     # AitherOS services only
    RESTRICTED  = "restricted"   # Named services only
    ADMIN       = "admin"        # Admin endpoints only

An API key for a third-party service might be INTERNAL — any AitherOS service can read it. A tenant's OAuth refresh token is RESTRICTED to the specific services that need it. The vault master key is ADMIN — accessible only through the admin API.

Every secret access is logged. Every secret has a TTL cache (5-minute default). Expired cache entries are re-fetched from the vault. The cache prevents the vault from becoming a bottleneck (AitherOS has 200+ services making frequent credential lookups), while the short TTL ensures rotated credentials propagate quickly.

Key Rotation

Ed25519 service keys rotate automatically every 90 days. The rotation is transparent — the new key is issued, the old key remains valid for a grace period, and services re-register their public keys. No downtime. No configuration changes.

Layer 6: Encrypted Backups

All of the above is worthless if a disk failure wipes everything. AitherRecover (port 8139) handles disaster recovery with a three-layer backup strategy.

What Gets Backed Up

The critical data set includes 30+ paths:

Data	Why It Matters
`vault.enc` + `.vault_salt`	All API keys, OAuth tokens, service credentials
`keys/` directory	16 Ed25519 service signing keypairs
`directory.db`	LDAP directory — users, agents, tenants, services, certificates
`rbac.db`	Access control rules, group memberships
PostgreSQL dumps	All relational data (conversation history, tenant data, training state)
Faculty graphs	Knowledge graphs, memory graphs, context data
Agent identities	16 YAML files defining agent personas and capabilities

Per-User Encryption

User backups are encrypted with per-user Fernet keys stored in Lockbox:

async def create_user_backup(self, user_id: str):
    # Get user-specific encryption key from Lockbox
    key = await self._get_user_backup_key(user_id)
    cipher = Fernet(key)

    # Gather data footprint
    footprint = await self._collect_user_data(user_id)

    # Encrypt the entire footprint
    encrypted = cipher.encrypt(
        json.dumps(footprint).encode('utf-8')
    )

    # Upload to private GitHub repo
    await self._github_upload(
        f"data/{backup_id}/footprint.enc",
        base64.b64encode(encrypted)
    )

Each user's backup is encrypted with a different key. If one key is compromised, other users' backups remain secure. The keys themselves live in Lockbox, which is encrypted with the hardware-bound master key. It's encryption all the way down.

Dual Storage Strategy

Backups go to two places:

Local filesystem — incremental backups with manifest-based deduplication
Private GitHub repo — encrypted, chunked into 40MB files for the GitHub API

The GitHub backup is the disaster recovery path. If the machine catches fire, you clone the backup repo, run the bootstrap script, point it at the repo, and everything restores. The PostgreSQL sidecar runs pg_dumpall every 15 minutes, so you lose at most 15 minutes of data.

Health Monitoring

The backup system monitors itself:

{
  "status": "healthy",
  "age_hours": 2.3,
  "staleness_threshold_hours": 12,
  "critical_files_present": {
    "vault.enc": true,
    ".vault_salt": true,
    "directory.db": true,
    "rbac.db": true,
    "signing_keys": true
  },
  "postgres": {
    "latest_dump": "dump_20260325_193800.sql",
    "age_minutes": 12.4,
    "healthy": true
  }
}

If the latest backup is older than 12 hours, or if any critical file is missing from the manifest, the health endpoint returns warning or critical. The proactive monitor picks this up and raises an interrupt.

How It All Connects

Here's the full request flow for that cyberpunk image generation attempt:

User (external IP) → Cloudflare Tunnel → Veil (port 3000)
    ↓ X-Caller-Type: "public"
Genesis (port 8001)
    ↓ build_caller_from_request()
    ↓ Detect: non-local IP → downgrade to ANONYMOUS
    ↓ CallerContext set in ContextVar
    ↓
IntentClassifier (3ms)
    ↓ type=vision, effort=3, chain=[iris, creative_engine]
    ↓
ChatEngine
    ↓ Check caller.can_generate → False
    ↓ Return: "Image generation is a platform-only feature"
    ↓ (100ms total, GPU never touched)

Compare that to the same request from localhost:

Me (192.168.x.x) → Veil (port 3000)
    ↓ X-Caller-Type: "platform"
Genesis (port 8001)
    ↓ build_caller_from_request()
    ↓ Detect: local IP → PLATFORM
    ↓
IntentClassifier (3ms)
    ↓ type=vision, effort=3
    ↓
ChatEngine
    ↓ Check caller.can_generate → True
    ↓ Direct Canvas fast-path
    ↓
Canvas (port 8108) → MicroScheduler (VRAM slot) → ComfyUI (port 8188)
    ↓ SDXL generation (~15s)
    ↓ Base64 image returned
    ↓
Strata: aither://warm/renders/{session}/{image}.png
    ↓ (tenant=platform, no prefix rewrite)

Same hardware. Same code. Same pipeline. Different permissions. The security isn't a wall around the system — it's woven into every layer of the request path.

The Philosophy: Defense in Depth, Not Defense in Front

Most AI systems bolt authentication onto the API gateway and call it done. If you have an API key, you're in. If you don't, you're out.

That's a single point of failure. One leaked key, one misconfigured header, one logic bug in the auth middleware — and everything is exposed.

AitherOS takes the opposite approach. Every layer enforces its own security independently:

CallerIsolation gates capabilities at the request level
Multi-tenancy isolates data at the storage level
Strata enforces tenant boundaries at the filesystem level
Lockbox encrypts sensitive configuration with hardware-bound keys
AitherSecrets manages credentials with per-service Ed25519 signing
AitherRecover encrypts backups with per-user keys

If CallerIsolation fails, the tenant boundary still holds. If the tenant boundary fails, Strata's prefix isolation still holds. If Strata is compromised, the Lockbox encryption still holds. If the Lockbox key leaks, the backup encryption uses different per-user keys.

No single layer failure exposes the entire system. That's the point.

What We Shipped Tonight

The image generation block was already working. What wasn't working was the user experience. Before tonight's fix, when the security gate blocked a generation request, the system set a flag and fell through to the normal LLM chat path. The orchestrator tried to respond with text, often timed out, and the user saw "Connection lost — the backend stopped responding."

Now the security gate returns an immediate, clear response:

Image generation is a platform-only feature and isn't available for external sessions. You're chatting with a live demo — text, code, and analysis are fully functional, but GPU-intensive workloads like image generation are restricted to the owner's local environment.

Sub-100 milliseconds. No GPU involvement. No timeout. No confusion.

Security that works silently is good. Security that explains itself is better.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringsecurityarchitecturemulti-tenantencryption

Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.

March 25, 202614 min readAitherium

Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.

Published by Aitherium — March 25, 2026

At 7:34 PM tonight, someone opened our public demo at demo.aitherium.com and typed:

Generate an image of a cyberpunk cityscape at sunset

Then the security gate killed it.

[SECURITY] Blocked image generation for external caller

This is the story of how that security gate works, and the five layers of isolation that sit beneath it.

The Problem: One Machine, Many Callers

That means two very different classes of users hit the same hardware:

Me — the platform owner, running locally, full access to everything
Everyone else — demo users, potential customers, curious visitors, bots

Layer 1: CallerIsolation

The CallerType Hierarchy

class CallerType(str, Enum):
    PLATFORM  = "platform"    # Local operator — full access
    TENANT    = "tenant"      # SaaS customer — plan-gated
    DEMO      = "demo"        # Demo key holder — limited
    PUBLIC    = "public"      # External 3rd-party — restricted
    ANONYMOUS = "anonymous"   # No identity — most restricted

Five levels. Each one maps to a permission matrix:

_CALLER_PERMISSIONS = {
    CallerType.PLATFORM: {
        "can_agentic": True,  "can_forge": True,
        "can_mutate": True,   "can_execute": True,
        "can_generate": True, "can_multi_agent": True,
    },
    CallerType.PUBLIC: {
        "can_agentic": True,  "can_forge": False,
        "can_mutate": False,  "can_execute": False,
        "can_generate": False,"can_multi_agent": False,
    },
    CallerType.ANONYMOUS: {
        "can_agentic": False, "can_forge": False,
        "can_mutate": False,  "can_execute": False,
        "can_generate": False,"can_multi_agent": False,
    },
}

Fail-Closed Classification

The critical design decision: non-local requests can never be classified as PLATFORM.

[CALLER] Non-local request resolved to PLATFORM — downgrading to ANONYMOUS

What This Looks Like in Practice

When the cyberpunk image request arrived:

Veil forwarded it through the Cloudflare tunnel (external IP)
Genesis detected non-local origin → downgraded to ANONYMOUS
IntentClassifier correctly identified vision intent (3ms)
ChatEngine checked caller.can_generate → False
Security gate returned an immediate, clear response

Layer 2: Multi-Tenant Isolation

CallerIsolation handles the "who are you" question. Multi-tenancy handles "where does your data go."

Per-Tenant Everything

Every tenant in AitherOS gets:

Resource	Isolation Method
Database	Separate PostgreSQL database (`aither_{slug}`)
Storage	Strata prefix (`tenants/{slug}/`)
Events	FluxBus namespace (`{slug}:`)
Secrets	AitherSecrets namespace (`tenants/{slug}`)
Context	ContextVar propagation (no cross-tenant leakage)

Plan-Gated Capabilities

Tenants get different permission levels based on their subscription plan:

PlanTier.EXPLORER:      effort_cap=6    # Orchestrator model only
PlanTier.BUILDER:       effort_cap=6    # Same model, more context
PlanTier.STARTER:       effort_cap=6    # $49/mo — basic agentic
PlanTier.GROWTH:        effort_cap=8    # $99/mo — reasoning unlocked
PlanTier.PROFESSIONAL:  effort_cap=10   # $199/mo — full reasoning + agentic
PlanTier.ENTERPRISE:    effort_cap=-1   # Custom — uncapped

Layer 3: Strata — Tiered Storage with Tenant Walls

HOT   (NVMe, <100ms)  — Active models, embeddings, tokenizers
WARM  (SSD, <50ms)    — Outputs, renders, training data, workspaces
COLD  (NAS, <500ms)   — Archives, backups, historical data
CACHE (Redis, <5ms)   — Ephemeral temp files, downloads

Why Strata Matters for Security

Automatic Tier Migration

Data moves between tiers based on access patterns:

WARM files untouched for 30 days migrate to COLD (with zstd compression)
CACHE files expire after 24 hours
HOT tier has a 100GB cap — when full, least-recently-used models get evicted to WARM

Layer 4: AitherLockbox — Five-Layer Encrypted Vault

The Five Layers

Layer 2: Encryption at Rest. Every file inside the lockbox is encrypted with Fernet (AES-128-CBC + HMAC-SHA256). The encryption key is derived from three inputs:

kdf = PBKDF2HMAC(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,          # 32 random bytes, unique per lockbox
    iterations=600000,  # OWASP 2023 recommendation
)
key = base64.urlsafe_b64encode(
    kdf.derive(passphrase + hardware_id)
)

Hardware-Bound Key Recovery

Docker containers get rebuilt. Hardware IDs change. This creates a key management problem: how do you decrypt data when the key material has changed?

Lockbox uses a recovery cascade:

Try the stable hardware ID (persisted to the data volume, survives rebuilds)
Try the AITHER_MASTER_KEY-derived ID (env var, set once)
Try the volatile hardware ID (platform-specific, last resort)
If all fail: wipe stale data rather than leaving unrecoverable encrypted blobs

Layer 5: AitherSecrets — The Credential Vault

Service Identity

Secret Types and Access Levels

class AccessLevel(str, Enum):
    PUBLIC      = "public"       # Any service can read
    INTERNAL    = "internal"     # AitherOS services only
    RESTRICTED  = "restricted"   # Named services only
    ADMIN       = "admin"        # Admin endpoints only

Key Rotation

Layer 6: Encrypted Backups

All of the above is worthless if a disk failure wipes everything. AitherRecover (port 8139) handles disaster recovery with a three-layer backup strategy.

What Gets Backed Up

The critical data set includes 30+ paths:

Data	Why It Matters
`vault.enc` + `.vault_salt`	All API keys, OAuth tokens, service credentials
`keys/` directory	16 Ed25519 service signing keypairs
`directory.db`	LDAP directory — users, agents, tenants, services, certificates
`rbac.db`	Access control rules, group memberships
PostgreSQL dumps	All relational data (conversation history, tenant data, training state)
Faculty graphs	Knowledge graphs, memory graphs, context data
Agent identities	16 YAML files defining agent personas and capabilities

Per-User Encryption

User backups are encrypted with per-user Fernet keys stored in Lockbox:

async def create_user_backup(self, user_id: str):
    # Get user-specific encryption key from Lockbox
    key = await self._get_user_backup_key(user_id)
    cipher = Fernet(key)

    # Gather data footprint
    footprint = await self._collect_user_data(user_id)

    # Encrypt the entire footprint
    encrypted = cipher.encrypt(
        json.dumps(footprint).encode('utf-8')
    )

    # Upload to private GitHub repo
    await self._github_upload(
        f"data/{backup_id}/footprint.enc",
        base64.b64encode(encrypted)
    )

Dual Storage Strategy

Backups go to two places:

Local filesystem — incremental backups with manifest-based deduplication
Private GitHub repo — encrypted, chunked into 40MB files for the GitHub API

Health Monitoring

The backup system monitors itself:

{
  "status": "healthy",
  "age_hours": 2.3,
  "staleness_threshold_hours": 12,
  "critical_files_present": {
    "vault.enc": true,
    ".vault_salt": true,
    "directory.db": true,
    "rbac.db": true,
    "signing_keys": true
  },
  "postgres": {
    "latest_dump": "dump_20260325_193800.sql",
    "age_minutes": 12.4,
    "healthy": true
  }
}

How It All Connects

Here's the full request flow for that cyberpunk image generation attempt:

User (external IP) → Cloudflare Tunnel → Veil (port 3000)
    ↓ X-Caller-Type: "public"
Genesis (port 8001)
    ↓ build_caller_from_request()
    ↓ Detect: non-local IP → downgrade to ANONYMOUS
    ↓ CallerContext set in ContextVar
    ↓
IntentClassifier (3ms)
    ↓ type=vision, effort=3, chain=[iris, creative_engine]
    ↓
ChatEngine
    ↓ Check caller.can_generate → False
    ↓ Return: "Image generation is a platform-only feature"
    ↓ (100ms total, GPU never touched)

Compare that to the same request from localhost:

Me (192.168.x.x) → Veil (port 3000)
    ↓ X-Caller-Type: "platform"
Genesis (port 8001)
    ↓ build_caller_from_request()
    ↓ Detect: local IP → PLATFORM
    ↓
IntentClassifier (3ms)
    ↓ type=vision, effort=3
    ↓
ChatEngine
    ↓ Check caller.can_generate → True
    ↓ Direct Canvas fast-path
    ↓
Canvas (port 8108) → MicroScheduler (VRAM slot) → ComfyUI (port 8188)
    ↓ SDXL generation (~15s)
    ↓ Base64 image returned
    ↓
Strata: aither://warm/renders/{session}/{image}.png
    ↓ (tenant=platform, no prefix rewrite)

Same hardware. Same code. Same pipeline. Different permissions. The security isn't a wall around the system — it's woven into every layer of the request path.

The Philosophy: Defense in Depth, Not Defense in Front

Most AI systems bolt authentication onto the API gateway and call it done. If you have an API key, you're in. If you don't, you're out.

That's a single point of failure. One leaked key, one misconfigured header, one logic bug in the auth middleware — and everything is exposed.

AitherOS takes the opposite approach. Every layer enforces its own security independently:

CallerIsolation gates capabilities at the request level
Multi-tenancy isolates data at the storage level
Strata enforces tenant boundaries at the filesystem level
Lockbox encrypts sensitive configuration with hardware-bound keys
AitherSecrets manages credentials with per-service Ed25519 signing
AitherRecover encrypts backups with per-user keys

No single layer failure exposes the entire system. That's the point.

What We Shipped Tonight

Now the security gate returns an immediate, clear response:

Image generation is a platform-only feature and isn't available for external sessions. You're chatting with a live demo — text, code, and analysis are fully functional, but GPU-intensive workloads like image generation are restricted to the owner's local environment.

Sub-100 milliseconds. No GPU involvement. No timeout. No confusion.

Security that works silently is good. Security that explains itself is better.

Enjoyed this post?

All posts Try AitherOS