Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Monitoring services…

•Connecting to services…

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

multi-tenantdeployment-ringsmodel-stacksdev-environmentspower-userarchitecture

You Are Your Own Best Tenant

Name: AitherOS
Author: Aitherium

April 6, 202611 min readAitherium

The Tenant I Didn't Expect

I built AitherOS's multi-tenant system because the architecture demanded it. Demo visitors, partner agents, isolated workloads — they'd all need separate environments. I wrote the technical migration story, the sovereign multitenancy deep dive, and the memory isolation walkthrough. Those posts explain how it works.

This post is about something I didn't expect: the first person who actually benefited from multi-tenancy was me.

Not because I onboarded anyone else. Because I realized that "tenant" doesn't mean "user." A tenant is an isolated runtime — its own memory, secrets, configs, event streams, and telemetry. And it turns out that's exactly what a solo developer needs to stop treating their single machine like a single environment.

The mental model shift is small, but the payoff is enormous. Every tenant gets its own brain configuration, its own personality, its own feature set, its own deployment ring. On one machine. No VMs, no cloud accounts, no resource duplication. Just config boundaries enforced at the application layer.

I built a multi-tenant system for isolation. The most valuable tenant turned out to be my own.

Anatomy of a Tenant Environment

Every tenant in AitherOS is a TenantContext — a dataclass that travels through every async request chain in the system. Here's the core of it, from lib/core/AitherTenant.py:

@dataclass
class TenantContext:
    tenant_id: str = PLATFORM_TENANT_ID
    slug: str = PLATFORM_TENANT_SLUG
    plan_tier: str = PlanTier.PLATFORM
    display_name: str = "Platform"
    db_name: str = ""
    config_overrides: Dict[str, Any] = field(default_factory=dict)
    created_at: Optional[str] = None
    metadata: Dict[str, Any] = field(default_factory=dict)

    def __post_init__(self):
        if not self.db_name:
            self.db_name = f"aither_{self.slug.replace('-', '_')}"

The real power is in the computed properties:

@property
def strata_prefix(self) -> str:
    """Prefix for Strata storage paths: tenants/{slug}/"""
    if self.is_platform:
        return ""
    return f"tenants/{self.slug}/"

@property
def flux_prefix(self) -> str:
    """Prefix for FluxBus tenant-scoped channels."""
    if self.is_platform:
        return ""
    return f"{self.slug}:"

@property
def secrets_namespace(self) -> str:
    """Namespace for tenant-scoped secrets."""
    if self.is_platform:
        return ""
    return f"tenants/{self.slug}"

Three properties, three isolation boundaries. Strata (telemetry and storage) gets path-prefixed. FluxBus (the event system) gets channel-prefixed. Secrets get namespace-scoped. The platform tenant — that's me, the operator — gets empty prefixes, meaning full access with zero overhead. Every other tenant sees only its own slice.

This isn't user management. It's environment management. And once you see it that way, you start creating tenants for yourself.

Dev, Staging, Prod — Without Three Machines

Here's what I actually run on my development machine:

platform — the operator tenant. Full access, uncapped effort, all tools. This is my primary workspace.
dev-lab — experimental changes. New context pipeline stages, untested spirit overlays, bleeding-edge model stacks.
staging — what demo.aitherium.com runs. Mirrors production config, but I can inspect it locally.
canary — a restricted tenant with starter-tier limits, for testing what the constrained experience actually feels like.

Each gets isolated memory. Teachings I give to dev-lab don't bleed into staging. Secrets I set for canary don't leak to platform. FluxBus events from one tenant's agents can't trigger another tenant's automations.

AitherOS's rings.yaml formalizes this with deployment rings. From config/rings.yaml:

rings:
  dev:
    id: 0
    name: "Development"
    description: "Local development ring — all changes land here first"
    environment: development
    branch: develop
    auto_deploy: true
    approval_required: false
    deployment_target: local-docker
    rollback_on_failure: true

  staging:
    id: 1
    name: "Staging"
    description: "Pre-production staging ring — demo.aitherium.com"
    environment: staging
    branch: staging
    auto_deploy: false
    approval_required: false
    deployment_target: docker-remote

  prod:
    id: 2
    name: "Production"
    description: "Live production ring — stable release"
    environment: production
    branch: main
    auto_deploy: false
    approval_required: true

Ring 0 auto-deploys from develop. Ring 1 requires a manual promote. Ring 2 requires approval. Pair each ring with a tenant, and you have three complete environments — with different configs, different branches, and different safety gates — on one machine.

Different Brains for Different Jobs

This is where it gets interesting. AitherOS has 12 named model stacks in config/model-stacks.yaml, each defining how effort levels 1-10 map to backends and models. Here's cloud-offload, the production default:

cloud-offload:
  description: "Orchestrator on GPU, reasoning on cloud"
  requires_gpu: true
  vram_estimate_gb: 14
  cloud_pool: [vastai_deepseek, gemini, claude, openai]
  effort_to_tier:
    1: fast
    2: fast
    3: fast
    4: balanced
    5: balanced
    6: balanced
    7: deep
    8: deep
    9: agentic
    10: ultra
  tier_backends:
    fast:
      backend: vllm
      model: aither-orchestrator
    balanced:
      backend: vllm
      model: aither-orchestrator
    deep:
      backend: vllm
      model: aither-orchestrator
    ultra:
      backend: vllm
      model: aither-orchestrator

Single orchestrator on GPU, everything routed through one model, reasoning overflow to cloud. Simple. Low VRAM. Now compare that to hyperscaler:

hyperscaler:
  description: "TQ 3.5-bit Qwen3.5-35B — 3400+ tok/s, 1M context"
  requires_gpu: true
  vram_estimate_gb: 19
  vllm_tq_env:
    VLLM_TQ_MODEL: "Qwen/Qwen3.5-35B-A3B-AWQ"
    VLLM_TQ_MAX_LEN: "1048576"
    VLLM_TQ_SERVED_NAME: "aither-hyperscaler"
  tier_backends:
    fast:
      backend: vllm
      model: aither-hyperscaler
    deep:
      backend: vllm
      model: aither-hyperscaler
    ultra:
      backend: vllm
      model: aither-hyperscaler

35B parameter model, 1M context window, 3400 tokens per second. Same effort routing, completely different brain.

Now: assign cloud-offload to your production tenant and hyperscaler to your benchmark tenant. Send the same prompts to both. Compare quality, latency, and token cost in Strata telemetry. You're A/B testing model stacks — not with a feature flag service and a month of planning, but with two tenants and a config override.

I do this constantly. elastic-hybrid runs Nemotron on CPU for reasoning while keeping the GPU free. ollama-only runs everything on CPU when I need the GPU for ComfyUI. Each tenant can activate a different stack, and switching is a single API call: POST /model-stacks/switch.

One System, Many Spirits

AitherOS has a personality overlay system called AitherSpirit. A spirit.md file defines voice, values, and style — injected between [IDENTITY] and [RULES] in the system prompt so it shapes personality without touching safety.

The SpiritLoader discovers spirit files in a specific order, from lib/core/SoulLoader.py:

1. ~/.aither/spirit.md              (global user spirit)
2. ~/.aither/spirits/{agent_name}.md (per-agent)
3. config/spirits/{agent_name}.md    (project-level per-agent)
4. config/spirits/default.md         (project default)
5. ~/.openclaw/workspace/SPIRIT.md   (auto-detect OpenClaw)

Blocked patterns prevent prompt injection — you can't sneak [AXIOMS] or ignore previous instructions through a spirit file. The overlay customizes; it can't override.

The TenantSpiritManager takes this further by creating a separate SpiritEngine per tenant. From lib/core/TenantSpiritManager.py:

class TenantSpiritManager:
    def get_engine(self, tenant):
        # Platform tenant uses the global engine (backwards compatible)
        if tenant.is_platform:
            return self._platform_engine

        # Non-platform tenants get isolated engines
        # with tenant-scoped storage at SPIRIT_DIR/tenants/{slug}
        slug = tenant.slug
        ...

Platform tenant gets the global spirit engine. Every other tenant gets its own, with storage at SPIRIT_DIR/tenants/{slug}.

In practice, I use this to run three different personalities:

Work spirit: Concise, technical, action-oriented. Minimal preamble, code-first.
Creative spirit: Expansive, exploratory. Longer responses, more analogies, willing to riff.
Teaching spirit: Step-by-step, checks understanding, includes context that the work spirit would skip.

Same system, same hardware, same model. Different tenants, different spirits, different experiences. I switch between them by switching tenant context, not by rewriting prompts.

Feature Flags, Built Into the OS

Most teams bolt feature flags onto their application. In AitherOS, feature gating is a property of the tenant's plan tier.

PLAN_TIER_EFFORT_CAPS: Dict[str, int] = {
    PlanTier.EXPLORER:      6,   # Chat only — orchestrator
    PlanTier.BUILDER:       6,   # Builder — orchestrator
    PlanTier.STARTER:       6,   # Starter — orchestrator only
    PlanTier.GROWTH:        8,   # Unlocks deep reasoning (effort 7-8)
    PlanTier.PROFESSIONAL: 10,   # Full reasoning + agentic
    PlanTier.ENTERPRISE:   -1,   # Uncapped
    PlanTier.PLATFORM:     -1,   # Platform operator — uncapped
}

Explorer can't trigger reasoning models. Growth unlocks effort 7-8 (local deep reasoning). Professional gets effort 9-10 (cloud reasoning, agentic dispatch). Platform is uncapped.

The TenantPackageManager adds per-tenant package enablements with HMAC-signed entitlement tokens. Packages — tools, integrations, premium features — can be gated by tier or explicitly enabled/disabled per tenant.

I use this to test what different restriction levels actually feel like. My constrained-test tenant runs at Explorer tier: effort capped at 6, no reasoning, limited tools. My full-power tenant runs at Platform tier: everything unlocked. When I'm building onboarding flows or testing UX, I switch to constrained-test and immediately see what breaks when the ceiling is lower.

This is the kind of testing you normally need a staging environment, a test account, and a feature flag service to do. I need a tenant with plan_tier: "explorer".

Every Codebase Gets Its Own Agent

When a new tenant is provisioned, the TenantOnboarder runs a four-phase pipeline. From lib/core/TenantOnboarder.py:

class OnboardingPhase(str, Enum):
    PROVISION = "provision"    # Database, entitlements, Strata dirs, admin user
    INGEST = "ingest"          # Repos, docs, code, media → faculty graphs
    SYNTHESIZE = "synthesize"  # Cross-link, warm graph namespace
    VERIFY = "verify"          # Health check all subsystems for completeness

Data sources can be git repos, code directories, doc directories, media, configs, or scripts:

class DataSourceType(str, Enum):
    GIT_REPO = "git_repo"
    CODE_DIR = "code_dir"
    DOC_DIR = "doc_dir"
    MEDIA_DIR = "media_dir"
    CONFIG_DIR = "config_dir"
    SCRIPT_DIR = "script_dir"

Each tenant gets its own CodeGraph index, its own faculty graphs, its own memory. The agents serving that tenant know that codebase — not a merged view of everything on the machine.

I use this for project isolation. My main AitherOS tenant has the full 62,000-chunk CodeGraph of the platform. But I also have tenants for side projects — a Python library here, a client project there. Each tenant's agents understand their own code, their own patterns, their own architecture. When I ask "how does authentication work?" the answer depends on which tenant I'm talking to.

This is what IDE workspace configs aspire to be, except the isolation goes all the way down through memory, context, tool access, and LLM routing.

Tenant-Based Deployment Rings

Combine tenants with rings.yaml and Docker compose profiles, and you get a full deployment ring system on one machine.

The pattern looks like this:

Ring 0 (dev): Tenant dev-lab. Branch develop. Auto-deploys. No approval needed. This is where I push breaking changes. The tenant runs hyperscaler model stack because I want maximum context window for debugging.
Ring 1 (staging): Tenant staging. Branch staging. Manual promote from Ring 0. Runs cloud-offload stack to match production config. demo.aitherium.com serves from this ring.
Ring 2 (prod): Tenant platform. Branch main. Requires approval. Runs cloud-offload. The stable environment I actually work in day-to-day.

Promotion gates are real: Ring 0 → Ring 1 requires health checks passing. Ring 1 → Ring 2 requires approval. Rollback on failure is automatic.

dev:
  auto_deploy: true
  approval_required: false
  rollback_on_failure: true

staging:
  auto_deploy: false
  approval_required: false
  rollback_on_failure: true

prod:
  auto_deploy: false
  approval_required: true

Not three clusters. Not three cloud accounts. Three tenants with three configs on one machine. The develop branch lands in Ring 0 automatically. If I like what I see, I promote to Ring 1. If staging holds for a day, I promote to Ring 2. The same Docker containers serve all three — only the tenant context changes what they do.

The 1-Person, N-Tenant Pattern

Here's the synthesis. Multi-tenancy is usually framed as a way to serve multiple users on shared infrastructure. But the same isolation primitives — scoped storage, scoped secrets, scoped events, scoped config — solve a problem that every solo developer has: you can't test production behavior from a development environment, because by definition they're different environments.

Tenants collapse that gap. A dev-lab tenant and a staging tenant on the same machine give you genuine environment isolation — not simulated, not mocked, actually separate memory graphs and event streams and secret namespaces — without the overhead of maintaining separate infrastructure.

What VMware promised with virtual machines, multi-tenancy delivers at the application layer. No hypervisor. No resource duplication. No per-environment billing. Just a TenantContext dataclass that every service in the stack respects.

I have seven tenants on my development machine right now. Zero external users. Every one of them pays for itself in bugs caught, configs validated, and model stacks compared before anything touches production.

I built a multi-tenant system for isolation. The most valuable tenant turned out to be my own.

Enjoyed this post?

All posts Try AitherOS

Back to blog

multi-tenantdeployment-ringsmodel-stacksdev-environmentspower-userarchitecture

You Are Your Own Best Tenant

April 6, 202611 min readAitherium

The Tenant I Didn't Expect

This post is about something I didn't expect: the first person who actually benefited from multi-tenancy was me.

I built a multi-tenant system for isolation. The most valuable tenant turned out to be my own.

Anatomy of a Tenant Environment

Every tenant in AitherOS is a TenantContext — a dataclass that travels through every async request chain in the system. Here's the core of it, from lib/core/AitherTenant.py:

@dataclass
class TenantContext:
    tenant_id: str = PLATFORM_TENANT_ID
    slug: str = PLATFORM_TENANT_SLUG
    plan_tier: str = PlanTier.PLATFORM
    display_name: str = "Platform"
    db_name: str = ""
    config_overrides: Dict[str, Any] = field(default_factory=dict)
    created_at: Optional[str] = None
    metadata: Dict[str, Any] = field(default_factory=dict)

    def __post_init__(self):
        if not self.db_name:
            self.db_name = f"aither_{self.slug.replace('-', '_')}"

The real power is in the computed properties:

@property
def strata_prefix(self) -> str:
    """Prefix for Strata storage paths: tenants/{slug}/"""
    if self.is_platform:
        return ""
    return f"tenants/{self.slug}/"

@property
def flux_prefix(self) -> str:
    """Prefix for FluxBus tenant-scoped channels."""
    if self.is_platform:
        return ""
    return f"{self.slug}:"

@property
def secrets_namespace(self) -> str:
    """Namespace for tenant-scoped secrets."""
    if self.is_platform:
        return ""
    return f"tenants/{self.slug}"

This isn't user management. It's environment management. And once you see it that way, you start creating tenants for yourself.

Dev, Staging, Prod — Without Three Machines

Here's what I actually run on my development machine:

platform — the operator tenant. Full access, uncapped effort, all tools. This is my primary workspace.
dev-lab — experimental changes. New context pipeline stages, untested spirit overlays, bleeding-edge model stacks.
staging — what demo.aitherium.com runs. Mirrors production config, but I can inspect it locally.
canary — a restricted tenant with starter-tier limits, for testing what the constrained experience actually feels like.

AitherOS's rings.yaml formalizes this with deployment rings. From config/rings.yaml:

rings:
  dev:
    id: 0
    name: "Development"
    description: "Local development ring — all changes land here first"
    environment: development
    branch: develop
    auto_deploy: true
    approval_required: false
    deployment_target: local-docker
    rollback_on_failure: true

  staging:
    id: 1
    name: "Staging"
    description: "Pre-production staging ring — demo.aitherium.com"
    environment: staging
    branch: staging
    auto_deploy: false
    approval_required: false
    deployment_target: docker-remote

  prod:
    id: 2
    name: "Production"
    description: "Live production ring — stable release"
    environment: production
    branch: main
    auto_deploy: false
    approval_required: true

Different Brains for Different Jobs

cloud-offload:
  description: "Orchestrator on GPU, reasoning on cloud"
  requires_gpu: true
  vram_estimate_gb: 14
  cloud_pool: [vastai_deepseek, gemini, claude, openai]
  effort_to_tier:
    1: fast
    2: fast
    3: fast
    4: balanced
    5: balanced
    6: balanced
    7: deep
    8: deep
    9: agentic
    10: ultra
  tier_backends:
    fast:
      backend: vllm
      model: aither-orchestrator
    balanced:
      backend: vllm
      model: aither-orchestrator
    deep:
      backend: vllm
      model: aither-orchestrator
    ultra:
      backend: vllm
      model: aither-orchestrator

Single orchestrator on GPU, everything routed through one model, reasoning overflow to cloud. Simple. Low VRAM. Now compare that to hyperscaler:

hyperscaler:
  description: "TQ 3.5-bit Qwen3.5-35B — 3400+ tok/s, 1M context"
  requires_gpu: true
  vram_estimate_gb: 19
  vllm_tq_env:
    VLLM_TQ_MODEL: "Qwen/Qwen3.5-35B-A3B-AWQ"
    VLLM_TQ_MAX_LEN: "1048576"
    VLLM_TQ_SERVED_NAME: "aither-hyperscaler"
  tier_backends:
    fast:
      backend: vllm
      model: aither-hyperscaler
    deep:
      backend: vllm
      model: aither-hyperscaler
    ultra:
      backend: vllm
      model: aither-hyperscaler

35B parameter model, 1M context window, 3400 tokens per second. Same effort routing, completely different brain.

One System, Many Spirits

The SpiritLoader discovers spirit files in a specific order, from lib/core/SoulLoader.py:

1. ~/.aither/spirit.md              (global user spirit)
2. ~/.aither/spirits/{agent_name}.md (per-agent)
3. config/spirits/{agent_name}.md    (project-level per-agent)
4. config/spirits/default.md         (project default)
5. ~/.openclaw/workspace/SPIRIT.md   (auto-detect OpenClaw)

Blocked patterns prevent prompt injection — you can't sneak [AXIOMS] or ignore previous instructions through a spirit file. The overlay customizes; it can't override.

The TenantSpiritManager takes this further by creating a separate SpiritEngine per tenant. From lib/core/TenantSpiritManager.py:

class TenantSpiritManager:
    def get_engine(self, tenant):
        # Platform tenant uses the global engine (backwards compatible)
        if tenant.is_platform:
            return self._platform_engine

        # Non-platform tenants get isolated engines
        # with tenant-scoped storage at SPIRIT_DIR/tenants/{slug}
        slug = tenant.slug
        ...

Platform tenant gets the global spirit engine. Every other tenant gets its own, with storage at SPIRIT_DIR/tenants/{slug}.

In practice, I use this to run three different personalities:

Work spirit: Concise, technical, action-oriented. Minimal preamble, code-first.
Creative spirit: Expansive, exploratory. Longer responses, more analogies, willing to riff.
Teaching spirit: Step-by-step, checks understanding, includes context that the work spirit would skip.

Same system, same hardware, same model. Different tenants, different spirits, different experiences. I switch between them by switching tenant context, not by rewriting prompts.

Feature Flags, Built Into the OS

Most teams bolt feature flags onto their application. In AitherOS, feature gating is a property of the tenant's plan tier.

PLAN_TIER_EFFORT_CAPS: Dict[str, int] = {
    PlanTier.EXPLORER:      6,   # Chat only — orchestrator
    PlanTier.BUILDER:       6,   # Builder — orchestrator
    PlanTier.STARTER:       6,   # Starter — orchestrator only
    PlanTier.GROWTH:        8,   # Unlocks deep reasoning (effort 7-8)
    PlanTier.PROFESSIONAL: 10,   # Full reasoning + agentic
    PlanTier.ENTERPRISE:   -1,   # Uncapped
    PlanTier.PLATFORM:     -1,   # Platform operator — uncapped
}

Explorer can't trigger reasoning models. Growth unlocks effort 7-8 (local deep reasoning). Professional gets effort 9-10 (cloud reasoning, agentic dispatch). Platform is uncapped.

This is the kind of testing you normally need a staging environment, a test account, and a feature flag service to do. I need a tenant with plan_tier: "explorer".

Every Codebase Gets Its Own Agent

When a new tenant is provisioned, the TenantOnboarder runs a four-phase pipeline. From lib/core/TenantOnboarder.py:

class OnboardingPhase(str, Enum):
    PROVISION = "provision"    # Database, entitlements, Strata dirs, admin user
    INGEST = "ingest"          # Repos, docs, code, media → faculty graphs
    SYNTHESIZE = "synthesize"  # Cross-link, warm graph namespace
    VERIFY = "verify"          # Health check all subsystems for completeness

Data sources can be git repos, code directories, doc directories, media, configs, or scripts:

class DataSourceType(str, Enum):
    GIT_REPO = "git_repo"
    CODE_DIR = "code_dir"
    DOC_DIR = "doc_dir"
    MEDIA_DIR = "media_dir"
    CONFIG_DIR = "config_dir"
    SCRIPT_DIR = "script_dir"

Each tenant gets its own CodeGraph index, its own faculty graphs, its own memory. The agents serving that tenant know that codebase — not a merged view of everything on the machine.

This is what IDE workspace configs aspire to be, except the isolation goes all the way down through memory, context, tool access, and LLM routing.

Tenant-Based Deployment Rings

Combine tenants with rings.yaml and Docker compose profiles, and you get a full deployment ring system on one machine.

The pattern looks like this:

Ring 0 (dev): Tenant dev-lab. Branch develop. Auto-deploys. No approval needed. This is where I push breaking changes. The tenant runs hyperscaler model stack because I want maximum context window for debugging.
Ring 1 (staging): Tenant staging. Branch staging. Manual promote from Ring 0. Runs cloud-offload stack to match production config. demo.aitherium.com serves from this ring.
Ring 2 (prod): Tenant platform. Branch main. Requires approval. Runs cloud-offload. The stable environment I actually work in day-to-day.

Promotion gates are real: Ring 0 → Ring 1 requires health checks passing. Ring 1 → Ring 2 requires approval. Rollback on failure is automatic.

dev:
  auto_deploy: true
  approval_required: false
  rollback_on_failure: true

staging:
  auto_deploy: false
  approval_required: false
  rollback_on_failure: true

prod:
  auto_deploy: false
  approval_required: true

The 1-Person, N-Tenant Pattern

I built a multi-tenant system for isolation. The most valuable tenant turned out to be my own.

Enjoyed this post?

All posts Try AitherOS