Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.
Someone Tried to Generate an Image on Our Demo. It Worked Exactly as Designed.
Published by Aitherium — March 25, 2026
At 7:34 PM tonight, someone opened our public demo at demo.aitherium.com and typed:
Generate an image of a cyberpunk cityscape at sunset
The intent classifier nailed it in 3 milliseconds. Category: vision. Chain: iris → creative_engine. Effort: 3. The system knew exactly what to do — route the request through the Canvas fast-path to ComfyUI, generate an SDXL image in about 15 seconds, and return a base64 blob.
Then the security gate killed it.
[SECURITY] Blocked image generation for external caller
The user got a clear message explaining that image generation is a platform-only feature. No timeout, no "connection lost", no cryptic error. A sub-100ms response explaining exactly what happened and why.
This is the story of how that security gate works, and the five layers of isolation that sit beneath it.
The Problem: One Machine, Many Callers
AitherOS runs on a single workstation. An RTX 5090 with 32GB of VRAM, 128GB of RAM, a Ryzen 9 9950X3D. It's a serious machine, but it's still one machine. And it's exposed to the internet through a Cloudflare tunnel.
That means two very different classes of users hit the same hardware:
- Me — the platform owner, running locally, full access to everything
- Everyone else — demo users, potential customers, curious visitors, bots
If both classes get the same permissions, a single viral tweet about the demo could burn through my GPU allocation in hours. Someone could queue up 500 image generation requests and lock ComfyUI for the rest of the day. Or worse — someone could trigger agentic workflows that spawn subagents, execute code in sandboxes, or write files to disk.
The traditional answer is "put it behind a login." But we wanted the demo to be frictionless. No sign-up. No API key. Just talk to the AI and see what it can do. That means the security has to be invisible — present for every request, but only blocking the things that actually matter.
Layer 1: CallerIsolation
Every request that enters Genesis (our system orchestrator on port 8001) goes through caller classification before anything else happens. This is not middleware you can skip. It's baked into the request processing pipeline.
The CallerType Hierarchy
class CallerType(str, Enum):
PLATFORM = "platform" # Local operator — full access
TENANT = "tenant" # SaaS customer — plan-gated
DEMO = "demo" # Demo key holder — limited
PUBLIC = "public" # External 3rd-party — restricted
ANONYMOUS = "anonymous" # No identity — most restricted
Five levels. Each one maps to a permission matrix:
_CALLER_PERMISSIONS = {
CallerType.PLATFORM: {
"can_agentic": True, "can_forge": True,
"can_mutate": True, "can_execute": True,
"can_generate": True, "can_multi_agent": True,
},
CallerType.PUBLIC: {
"can_agentic": True, "can_forge": False,
"can_mutate": False, "can_execute": False,
"can_generate": False,"can_multi_agent": False,
},
CallerType.ANONYMOUS: {
"can_agentic": False, "can_forge": False,
"can_mutate": False, "can_execute": False,
"can_generate": False,"can_multi_agent": False,
},
}
PUBLIC callers can chat. They get the full context pipeline, the orchestrator model, even tool-augmented responses. But they can't generate images, spawn subagents, execute code, create documents, or trigger multi-agent workflows. ANONYMOUS callers can't even enter the agentic loop.
Fail-Closed Classification
The critical design decision: non-local requests can never be classified as PLATFORM.
[CALLER] Non-local request resolved to PLATFORM — downgrading to ANONYMOUS
That log line is the fail-safe. If a request arrives from an IP outside RFC 1918 private ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) but somehow has PLATFORM-level headers, it gets forcibly downgraded. You can't spoof your way to platform access from the public internet.
The Veil frontend sends an X-Caller-Type header with each request. Genesis reads it, validates the source IP, and builds a CallerContext object that propagates through every service call via Python's contextvars. Every downstream service — ChatEngine, AgentForge, ActionExecutor, Strata — can read the caller context without it being passed as a parameter. It's ambient. And it's immutable once set.
What This Looks Like in Practice
When the cyberpunk image request arrived:
- Veil forwarded it through the Cloudflare tunnel (external IP)
- Genesis detected non-local origin → downgraded to ANONYMOUS
- IntentClassifier correctly identified
visionintent (3ms) - ChatEngine checked
caller.can_generate→False - Security gate returned an immediate, clear response
The image generation pipeline was never invoked. Canvas never saw the request. ComfyUI never loaded a model. The GPU never context-switched. The entire rejection happened in the orchestrator's Python process, in under 100 milliseconds.
Layer 2: Multi-Tenant Isolation
CallerIsolation handles the "who are you" question. Multi-tenancy handles "where does your data go."
Per-Tenant Everything
Every tenant in AitherOS gets:
| Resource | Isolation Method |
|---|---|
| Database | Separate PostgreSQL database (aither_{slug}) |
| Storage | Strata prefix (tenants/{slug}/) |
| Events | FluxBus namespace ({slug}:) |
| Secrets | AitherSecrets namespace (tenants/{slug}) |
| Context | ContextVar propagation (no cross-tenant leakage) |
Public demo users are automatically routed to a PUBLIC tenant. Their conversations, their context, their generated artifacts — everything lives in a sandboxed namespace. If a tenant requests aither://warm/outputs/image.png, Strata silently rewrites it to aither://warm/tenants/{slug}/outputs/image.png. The tenant never sees the rewrite.
Plan-Gated Capabilities
Tenants get different permission levels based on their subscription plan:
PlanTier.EXPLORER: effort_cap=6 # Orchestrator model only
PlanTier.BUILDER: effort_cap=6 # Same model, more context
PlanTier.STARTER: effort_cap=6 # $49/mo — basic agentic
PlanTier.GROWTH: effort_cap=8 # $99/mo — reasoning unlocked
PlanTier.PROFESSIONAL: effort_cap=10 # $199/mo — full reasoning + agentic
PlanTier.ENTERPRISE: effort_cap=-1 # Custom — uncapped
The effort cap controls which models the tenant can access. Efforts 1-6 route to the orchestrator (fast, cheap). Efforts 7-8 unlock the reasoning model (slower, smarter). Efforts 9-10 get the full reasoning pipeline with graph context, RLM workspace, and multi-agent coordination.
Image generation (can_generate) only unlocks at GROWTH tier and above. Document generation follows the same gate. These aren't arbitrary restrictions — they map directly to GPU cost. A single SDXL image generation takes 15 seconds of exclusive GPU time. At scale, that's the most expensive operation in the system.
Layer 3: Strata — Tiered Storage with Tenant Walls
Strata is our unified filesystem service. Every piece of data in AitherOS — models, images, training data, conversation logs, agent artifacts — flows through Strata. It runs on port 8136 and presents a virtual filesystem with four tiers:
HOT (NVMe, <100ms) — Active models, embeddings, tokenizers
WARM (SSD, <50ms) — Outputs, renders, training data, workspaces
COLD (NAS, <500ms) — Archives, backups, historical data
CACHE (Redis, <5ms) — Ephemeral temp files, downloads
Why Strata Matters for Security
Every service in AitherOS writes through Strata. When Canvas generates an image, it goes to Strata. When a conversation is stored, it goes to Strata. When an agent creates an artifact, it goes to Strata.
This means Strata is the single enforcement point for storage isolation. If tenant A's data accidentally ends up in tenant B's namespace, that's a Strata bug — and only a Strata bug. No other service can write to raw disk paths. The virtual filesystem is the only interface.
Strata also feeds the training pipeline. IDE sessions from Claude Code, Cursor, Copilot, and Gemini are ingested through /api/v1/ingest/ide-session. Conversation exchanges flow through FluxEmitter events into the KnowledgeIngester, which indexes them in faculty graphs. All of this respects tenant boundaries. A tenant's conversations never leak into another tenant's knowledge graph.
Automatic Tier Migration
Data moves between tiers based on access patterns:
- WARM files untouched for 30 days migrate to COLD (with zstd compression)
- CACHE files expire after 24 hours
- HOT tier has a 100GB cap — when full, least-recently-used models get evicted to WARM
This isn't just about performance. COLD tier data on NAS storage is physically separate from the NVMe pool that serves active requests. An attacker who somehow compromises the hot path doesn't automatically get access to archived data.
Layer 4: AitherLockbox — Five-Layer Encrypted Vault
Lockbox is where the truly sensitive stuff lives. Agent personas, system prompts (we call them "wills"), authorization policies, cryptographic keys, and static configuration that should never be readable without explicit unlocking.
The Five Layers
Layer 1: Hidden Location. The lockbox directory isn't called "lockbox" or "secrets." It's a SHA-256 hash of the hardware ID, buried in a .cache/ directory. No symlinks point to it. It's in .gitignore. You'd have to know what you're looking for to find it.
Layer 2: Encryption at Rest. Every file inside the lockbox is encrypted with Fernet (AES-128-CBC + HMAC-SHA256). The encryption key is derived from three inputs:
kdf = PBKDF2HMAC(
algorithm=hashes.SHA256(),
length=32,
salt=salt, # 32 random bytes, unique per lockbox
iterations=600000, # OWASP 2023 recommendation
)
key = base64.urlsafe_b64encode(
kdf.derive(passphrase + hardware_id)
)
600,000 iterations of PBKDF2. The salt is 32 bytes of cryptographic randomness. The key material includes the hardware ID, so even if someone steals the encrypted files and knows the passphrase, they can't decrypt them on a different machine.
Layer 3: Integrity Verification. A SHA-256 manifest tracks every file in the lockbox. The manifest itself is encrypted. On every access, the lockbox verifies the manifest hash against the stored value. If a single byte has been modified, tampered, or corrupted — the lockbox detects it and logs a TAMPER_DETECTED audit event.
Layer 4: Access Control. The lockbox requires a passphrase to unlock. After unlocking, it auto-locks after 30 minutes of inactivity. Five failed unlock attempts trigger a 15-minute lockout. Every operation — unlock, lock, store, retrieve, delete — is written to a monthly audit log.
Layer 5: Memory Protection. Decrypted data is held in memory only. It's never written to temp files. When the lockbox locks, the in-memory cache is cleared immediately. The decrypted content exists for exactly as long as it's needed, and not one second longer.
Hardware-Bound Key Recovery
Docker containers get rebuilt. Hardware IDs change. This creates a key management problem: how do you decrypt data when the key material has changed?
Lockbox uses a recovery cascade:
- Try the stable hardware ID (persisted to the data volume, survives rebuilds)
- Try the
AITHER_MASTER_KEY-derived ID (env var, set once) - Try the volatile hardware ID (platform-specific, last resort)
- If all fail: wipe stale data rather than leaving unrecoverable encrypted blobs
The stable hardware ID is the critical piece. It's written to a file on the Docker data volume on first boot and persists across container rebuilds. As long as the data volume survives, the lockbox survives.
Layer 5: AitherSecrets — The Credential Vault
AitherSecrets (port 8111) is the centralized credential store. Every API key, OAuth token, database password, and service certificate in the system lives here. Not in environment variables. Not in .env files. Not in Docker secrets. In the vault.
Service Identity
When a service boots, AitherSecrets issues it an Ed25519 keypair. The private key is stored encrypted in the vault. The public key is distributed for verification. Every inter-service HTTP request is signed with the caller's Ed25519 private key and verified by the recipient's middleware.
This means service impersonation is cryptographically impossible. If an attacker compromises one container and tries to make requests as a different service, the signature check fails. The receiving service rejects the request before it hits the application layer.
Secret Types and Access Levels
class AccessLevel(str, Enum):
PUBLIC = "public" # Any service can read
INTERNAL = "internal" # AitherOS services only
RESTRICTED = "restricted" # Named services only
ADMIN = "admin" # Admin endpoints only
An API key for a third-party service might be INTERNAL — any AitherOS service can read it. A tenant's OAuth refresh token is RESTRICTED to the specific services that need it. The vault master key is ADMIN — accessible only through the admin API.
Every secret access is logged. Every secret has a TTL cache (5-minute default). Expired cache entries are re-fetched from the vault. The cache prevents the vault from becoming a bottleneck (AitherOS has 200+ services making frequent credential lookups), while the short TTL ensures rotated credentials propagate quickly.
Key Rotation
Ed25519 service keys rotate automatically every 90 days. The rotation is transparent — the new key is issued, the old key remains valid for a grace period, and services re-register their public keys. No downtime. No configuration changes.
Layer 6: Encrypted Backups
All of the above is worthless if a disk failure wipes everything. AitherRecover (port 8139) handles disaster recovery with a three-layer backup strategy.
What Gets Backed Up
The critical data set includes 30+ paths:
| Data | Why It Matters |
|---|---|
vault.enc + .vault_salt | All API keys, OAuth tokens, service credentials |
keys/ directory | 16 Ed25519 service signing keypairs |
directory.db | LDAP directory — users, agents, tenants, services, certificates |
rbac.db | Access control rules, group memberships |
| PostgreSQL dumps | All relational data (conversation history, tenant data, training state) |
| Faculty graphs | Knowledge graphs, memory graphs, context data |
| Agent identities | 16 YAML files defining agent personas and capabilities |
Per-User Encryption
User backups are encrypted with per-user Fernet keys stored in Lockbox:
async def create_user_backup(self, user_id: str):
# Get user-specific encryption key from Lockbox
key = await self._get_user_backup_key(user_id)
cipher = Fernet(key)
# Gather data footprint
footprint = await self._collect_user_data(user_id)
# Encrypt the entire footprint
encrypted = cipher.encrypt(
json.dumps(footprint).encode('utf-8')
)
# Upload to private GitHub repo
await self._github_upload(
f"data/{backup_id}/footprint.enc",
base64.b64encode(encrypted)
)
Each user's backup is encrypted with a different key. If one key is compromised, other users' backups remain secure. The keys themselves live in Lockbox, which is encrypted with the hardware-bound master key. It's encryption all the way down.
Dual Storage Strategy
Backups go to two places:
- Local filesystem — incremental backups with manifest-based deduplication
- Private GitHub repo — encrypted, chunked into 40MB files for the GitHub API
The GitHub backup is the disaster recovery path. If the machine catches fire, you clone the backup repo, run the bootstrap script, point it at the repo, and everything restores. The PostgreSQL sidecar runs pg_dumpall every 15 minutes, so you lose at most 15 minutes of data.
Health Monitoring
The backup system monitors itself:
{
"status": "healthy",
"age_hours": 2.3,
"staleness_threshold_hours": 12,
"critical_files_present": {
"vault.enc": true,
".vault_salt": true,
"directory.db": true,
"rbac.db": true,
"signing_keys": true
},
"postgres": {
"latest_dump": "dump_20260325_193800.sql",
"age_minutes": 12.4,
"healthy": true
}
}
If the latest backup is older than 12 hours, or if any critical file is missing from the manifest, the health endpoint returns warning or critical. The proactive monitor picks this up and raises an interrupt.
How It All Connects
Here's the full request flow for that cyberpunk image generation attempt:
User (external IP) → Cloudflare Tunnel → Veil (port 3000)
↓ X-Caller-Type: "public"
Genesis (port 8001)
↓ build_caller_from_request()
↓ Detect: non-local IP → downgrade to ANONYMOUS
↓ CallerContext set in ContextVar
↓
IntentClassifier (3ms)
↓ type=vision, effort=3, chain=[iris, creative_engine]
↓
ChatEngine
↓ Check caller.can_generate → False
↓ Return: "Image generation is a platform-only feature"
↓ (100ms total, GPU never touched)
Compare that to the same request from localhost:
Me (192.168.x.x) → Veil (port 3000)
↓ X-Caller-Type: "platform"
Genesis (port 8001)
↓ build_caller_from_request()
↓ Detect: local IP → PLATFORM
↓
IntentClassifier (3ms)
↓ type=vision, effort=3
↓
ChatEngine
↓ Check caller.can_generate → True
↓ Direct Canvas fast-path
↓
Canvas (port 8108) → MicroScheduler (VRAM slot) → ComfyUI (port 8188)
↓ SDXL generation (~15s)
↓ Base64 image returned
↓
Strata: aither://warm/renders/{session}/{image}.png
↓ (tenant=platform, no prefix rewrite)
Same hardware. Same code. Same pipeline. Different permissions. The security isn't a wall around the system — it's woven into every layer of the request path.
The Philosophy: Defense in Depth, Not Defense in Front
Most AI systems bolt authentication onto the API gateway and call it done. If you have an API key, you're in. If you don't, you're out.
That's a single point of failure. One leaked key, one misconfigured header, one logic bug in the auth middleware — and everything is exposed.
AitherOS takes the opposite approach. Every layer enforces its own security independently:
- CallerIsolation gates capabilities at the request level
- Multi-tenancy isolates data at the storage level
- Strata enforces tenant boundaries at the filesystem level
- Lockbox encrypts sensitive configuration with hardware-bound keys
- AitherSecrets manages credentials with per-service Ed25519 signing
- AitherRecover encrypts backups with per-user keys
If CallerIsolation fails, the tenant boundary still holds. If the tenant boundary fails, Strata's prefix isolation still holds. If Strata is compromised, the Lockbox encryption still holds. If the Lockbox key leaks, the backup encryption uses different per-user keys.
No single layer failure exposes the entire system. That's the point.
What We Shipped Tonight
The image generation block was already working. What wasn't working was the user experience. Before tonight's fix, when the security gate blocked a generation request, the system set a flag and fell through to the normal LLM chat path. The orchestrator tried to respond with text, often timed out, and the user saw "Connection lost — the backend stopped responding."
Now the security gate returns an immediate, clear response:
Image generation is a platform-only feature and isn't available for external sessions. You're chatting with a live demo — text, code, and analysis are fully functional, but GPU-intensive workloads like image generation are restricted to the owner's local environment.
Sub-100 milliseconds. No GPU involvement. No timeout. No confusion.
Security that works silently is good. Security that explains itself is better.