I'm writing this while Genesis — the orchestrator at the heart of AitherOS — is restarting. I just docker cp'd three files into it and bounced the container. The website at demo.aitherium.com dropped for about four seconds. It's back. The tunnel reconnected. Nobody noticed.

This is how I work every day. I develop on a running production platform, deploying code while it serves traffic, updating services mid-conversation, and bringing containers up and down constantly — all from a single desktop, with no cloud provider, no managed Kubernetes, and no ops team.

People assume this kind of live-deployment workflow requires AWS or GCP behind it. It doesn't. Here's the entire stack.

The Machine

One workstation. That's it.

CPU: AMD (96GB DDR5)
GPU: NVIDIA RTX 5090 (32GB VRAM)
OS: Windows 11 + WSL2 (32GB allocation)
Runtime: Docker Desktop → 128 containers defined, ~40-65 running at any time

No rack servers. No cloud VMs. The same machine I write code on is the machine that serves the website, runs the LLMs, processes agent tasks, and hosts the tunnel endpoint.

The Architecture in 30 Seconds

AitherOS is 202 services across 12 architectural layers — from GPU scheduling at L6 to a Next.js dashboard at L10. 8 foundational services (Pulse, Secrets, Chronicle, Strata, vLLM, and more) start with any docker compose up. Everything else is organized into 30 profiles: core, intelligence, agents, perception, memory, security, training, gpu, and more.

# Start the core platform (~25 containers)
.DEPLOYMENT/scripts/compose.sh aitheros --profile core up -d

# Add agents when I need them
.DEPLOYMENT/scripts/compose.sh aitheros --profile core --profile agents up -d

# Rebuild just one service
.DEPLOYMENT/scripts/compose.sh aitheros up -d --build aither-genesis

Services communicate over a single Docker bridge network (aither-network) using Docker DNS. When AITHER_DOCKER_MODE=true, every service resolves URLs like http://aither-genesis:8001 automatically via AitherPorts. No service discovery daemon. No infrastructure mesh like Istio or Consul. Docker's built-in DNS does the job. (AitherOS has its own application-level mesh at Layer 8.5 for node deployment and edge networking — but that's an OS feature, not an infrastructure dependency.)

How Code Gets to Production

Here's the part that makes infrastructure people nervous: I deploy by copying files into running containers.

The `docker cp` Pattern

Genesis — the central orchestrator — bakes its code into the Docker image at build time. Bind-mounting the source directory doesn't work because Docker Desktop's 9P file-sharing protocol adds 5-50ms latency per syscall, which freezes Genesis's asyncio event loop. So the image contains the code, and when I need to update it mid-session:

# Edit files in VS Code on Windows
# Then push them into the running container
docker cp "./AitherOS/lib/core/AgentKernel.py" aitheros-genesis:/app/AitherOS/lib/core/AgentKernel.py
docker cp "./AitherOS/lib/automation/SchedulerLoop.py" aitheros-genesis:/app/AitherOS/lib/automation/SchedulerLoop.py
docker restart aitheros-genesis

Three files updated. Container restarts in ~4 seconds. The Cloudflare tunnel reconnects. Done.

For lighter services — the ones that don't run a 50+-router FastAPI app with a heavy event loop — I use read-only bind mounts:

volumes:
  - ./AitherOS/services:/app/services:ro
  - ./AitherOS/lib:/app/lib:ro
  - ./AitherOS/config:/app/config:ro

Change a file on the host, the container sees it immediately. Some services pick up changes on the next request; others need a restart. Either way, there's no build step for routine changes.

When I Actually Rebuild

For structural changes — new dependencies, Dockerfile modifications, new service definitions — I rebuild the image:

.DEPLOYMENT/scripts/compose.sh aitheros up -d --build aither-genesis

First build takes 5-10 minutes (base image). After that, ~10 seconds per service thanks to a shared multi-target Dockerfile with BuildKit layer caching.

The Tunnel: Cloudflare Does the Heavy Lifting

The website at demo.aitherium.com is served by AitherVeil (Next.js 16) running inside Docker on my desktop. It reaches the internet through a single Cloudflare Tunnel:

aitheros-tunnel:
  image: cloudflare/cloudflared:latest
  # Dynamic: 'tunnel run' if token exists, else quick tunnel to Veil
  command: tunnel ${CLOUDFLARE_TUNNEL_TOKEN:+run} ${CLOUDFLARE_TUNNEL_TOKEN:---url http://aither-veil:3000}
  environment:
    TUNNEL_TOKEN: "${CLOUDFLARE_TUNNEL_TOKEN:-}"
  depends_on:
    aither-veil:
      condition: service_healthy

That's it. Cloudflare terminates TLS, handles DDoS protection, and routes traffic through their edge network to my machine. No static IP needed. No port forwarding. No nginx config on a VPS.

When I restart a service, the tunnel briefly disconnects and reconnects. Cloudflare's edge returns a branded maintenance page from a Worker during the gap — users see "be right back" instead of a connection error. For API consumers hitting /api/*, the Worker returns a proper JSON 503 so automated clients can retry gracefully.

The Subdomain Map

All of these resolve to containers on my desktop:

Subdomain	What It Hits
`demo.aitherium.com`	AitherVeil dashboard (port 3000)
`gateway.aitherium.com`	API gateway (Cloudflare Worker → port 8185)
`mcp.aitherium.com`	MCP tool gateway (bypasses WAF bot challenges)
`irc.aitherium.com`	WebSocket chat relay

The main aitherium.com runs on GitHub Pages — completely static, completely separate. If my machine is off, the marketing site stays up.

No AWS. Here's What I Use Instead.

Let me map the typical cloud services to what I'm actually running:

AWS Service	My Equivalent	Cost
EC2 / ECS	Docker Desktop on my workstation	$0
Route 53	Cloudflare DNS (free tier)	$0
ALB / CloudFront	Cloudflare Tunnel + Workers	$0 (free tier)
RDS	PostgreSQL in Docker	$0
ElastiCache	Redis in Docker	$0
SageMaker	vLLM + Ollama on local GPU	$0
Secrets Manager	AitherSecrets (self-hosted vault, port 8111)	$0
CloudWatch	Chronicle (self-hosted logging, port 8121)	$0
S3	Docker named volumes (25 of them)	$0
API Gateway	Cloudflare Worker + AitherOS Gateway	$0
Cognito	AitherIdentity + Cloudflare Zero Trust SSO	$0
SQS / EventBridge	FluxEmitter (hybrid event bus — in-process + HTTP cross-container)	$0

Total monthly cloud bill: $0.

The hardware cost is real — a workstation with a 5090 isn't cheap. But it's a one-time capital expense, and the GPU pays for itself in inference costs within weeks. Running Nemotron-8B and DeepSeek-R1:14b locally, I'd be spending thousands per month on API calls for equivalent throughput.

The Three-Wave Boot Sequence

When I power on and start the stack, services come up in waves:

Wave 0 (immediate): Pulse (heartbeat), Chronicle (logs), Secrets (vault), Redis, PostgreSQL, vLLM. These have zero dependencies — they just start.

Wave 1 (~25 seconds): Core platform — Genesis, AitherVeil, Node, Watch, SecurityCore. These wait for Pulse and Chronicle to report healthy.

Wave 2 (~55 seconds): Everything else — agents, cognitive services, automation. These wait for Genesis to report healthy.

The whole stack is up in about a minute. Each service has health checks that compose monitors — no service starts before its dependencies are confirmed alive.

npm start          # Core stack (~25 containers)
npm run start:full # Everything (~65 containers)

Ring-Based Deployment (When It Matters)

For actual releases — not my daily hack-and-restart cycle — there's a proper promotion pipeline:

develop → staging → main
 Ring 0    Ring 1    Ring 2
  local   demo.aitherium.com   production release

Ring 0 (dev): My desktop. Auto-deploys on every push to develop. This is where the docker cp workflow lives.
Ring 1 (staging): Same machine, but gated. Requires passing health checks, unit tests (pytest + Pester), and lint before promotion. Serves demo.aitherium.com.
Ring 2 (prod): Tagged release. Images pushed to ghcr.io/aitherium. Requires 15 minutes of staging stability + smoke tests + manual approval.

npm run promote:staging  # dev → staging
npm run promote:prod     # staging → prod (requires approval)

Most days I live in Ring 0. The website people see is Ring 1. Ring 2 is for milestones.

Inference: Two GPUs Worth of Models on One Card

The RTX 5090's 32GB VRAM runs a dual-model setup:

vLLM primary (port 8120): Nemotron-Orchestrator-8B — handles chat, tool routing, and agent orchestration. 40% VRAM.
vLLM reasoning (port 8176): DeepSeek-R1:14b — always-on reasoning model for complex tasks. 45% VRAM.
Ollama (CPU, port 11434): llama3.2:3b — fast reflex model for simple queries. Zero VRAM.

MicroScheduler (port 8150) coordinates all LLM traffic — enforcing VRAM budgets, priority queues, and concurrency limits. Every LLM call in the entire system routes through MicroScheduler. No service calls vLLM directly.

An effort-based router picks the model automatically: effort 1-2 goes to the 3B CPU model, 3-6 to the 8B orchestrator, 7-10 to the 14B reasoning model. The user never chooses. The system just picks the right tool.

What Goes Wrong (And How It Recovers)

This setup is not bulletproof. Things break regularly:

Docker Desktop memory pressure: 65 containers fighting for 32GB of WSL2 memory. I've learned which services can be profiled out when I need headroom. --profile gpu stays off unless I'm doing GPU work.

Tunnel drops: When I restart AitherVeil, the Cloudflare tunnel disconnects for 3-5 seconds. The Cloudflare Worker catches this and serves a maintenance page. Not ideal, not catastrophic.

9P filesystem hell: Docker Desktop's file sharing between Windows and Linux containers is genuinely terrible. 84GB of model cache in named volumes because bind mounts cause 9P protocol hangs. Postgres uses bind mounts to survive docker compose down -v, but they're on an NTFS path, so I watch for symlink issues.

Service ordering races: Sometimes a service starts before its dependency is truly ready, even with health checks. The three-wave boot sequence with fast polling (10-second intervals for gate services) catches most of these, but occasionally I have to restart a straggler.

None of this is unsolvable. It's just the tax you pay for running a distributed system on one machine. The tradeoff is total control and zero monthly cost.

The Daily Workflow

Here's what a typical development day looks like:

Morning: npm start — core stack comes up in ~60 seconds.
Code: Edit in VS Code with Claude Code + Copilot + Serena (semantic code intelligence) + AitherNode MCP (250+ tools from the OS itself).
Deploy: docker cp changed files into Genesis, restart. Or just save and let bind mounts propagate.
Test: Hit the live website. Run pytest. Check Chronicle logs. Fix, repeat.
Ship: When it's stable, npm run promote:staging pushes to the demo site.
Write: Blog posts like this one get written in the Veil blog editor at /blog/editor, published through the same pipeline.

I don't have a staging server. I don't have a CI/CD cluster. I don't have a DevOps team. The machine I'm typing on is the entire platform — development, staging, and production.

Why This Works (For Now)

This setup works because of three things:

Cloudflare Tunnel eliminates the networking problem. No static IP, no port forwarding, no firewall rules. Encrypted tunnel from my desktop to Cloudflare's edge. Free.
Docker Compose is underrated. 5233+ lines of compose YAML replaces what most companies use Kubernetes for. Named volumes, health checks, dependency ordering, resource limits, GPU passthrough, profile-based selective startup — it all works.
The GPU changes the economics. Local inference with a 32GB VRAM card means I don't need cloud GPU instances. The models run on my desk. The latency is lower than API calls. The cost is zero per token.

This won't scale to thousands of users. When it needs to, AitherOS has a bare-metal deployment path (Rocky Linux 9 + Podman + systemd) and a container registry (ghcr.io/aitherium) ready to go. But right now, one machine serving a live AI platform while I develop on it — that's the sweet spot.

The Point

The cloud is a tool, not a requirement. AWS is great when you need elastic scale, multi-region redundancy, or managed services at enterprise volume. But for a founder building and shipping an AI platform? A desktop with a good GPU, Docker Compose, and a Cloudflare tunnel is the entire infrastructure.

I deploy code to a live website by copying files into containers and restarting them. The website drops for four seconds. It comes back. I keep coding.

No AWS bill. No Terraform state files. No Kubernetes manifests. Just code, containers, and a tunnel to the internet.

That's the setup. It's messy, it's opinionated, and it works every single day.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringinfrastructuredeploymentfounderself-hosted

Deploy on the Fly, No AWS Required: How I Ship Code to a Live AI Platform From My Desktop

March 13, 202614 min readAitherium

People assume this kind of live-deployment workflow requires AWS or GCP behind it. It doesn't. Here's the entire stack.

The Machine

One workstation. That's it.

CPU: AMD (96GB DDR5)
GPU: NVIDIA RTX 5090 (32GB VRAM)
OS: Windows 11 + WSL2 (32GB allocation)
Runtime: Docker Desktop → 128 containers defined, ~40-65 running at any time

No rack servers. No cloud VMs. The same machine I write code on is the machine that serves the website, runs the LLMs, processes agent tasks, and hosts the tunnel endpoint.

The Architecture in 30 Seconds

# Start the core platform (~25 containers)
.DEPLOYMENT/scripts/compose.sh aitheros --profile core up -d

# Add agents when I need them
.DEPLOYMENT/scripts/compose.sh aitheros --profile core --profile agents up -d

# Rebuild just one service
.DEPLOYMENT/scripts/compose.sh aitheros up -d --build aither-genesis

How Code Gets to Production

Here's the part that makes infrastructure people nervous: I deploy by copying files into running containers.

The `docker cp` Pattern

# Edit files in VS Code on Windows
# Then push them into the running container
docker cp "./AitherOS/lib/core/AgentKernel.py" aitheros-genesis:/app/AitherOS/lib/core/AgentKernel.py
docker cp "./AitherOS/lib/automation/SchedulerLoop.py" aitheros-genesis:/app/AitherOS/lib/automation/SchedulerLoop.py
docker restart aitheros-genesis

Three files updated. Container restarts in ~4 seconds. The Cloudflare tunnel reconnects. Done.

For lighter services — the ones that don't run a 50+-router FastAPI app with a heavy event loop — I use read-only bind mounts:

volumes:
  - ./AitherOS/services:/app/services:ro
  - ./AitherOS/lib:/app/lib:ro
  - ./AitherOS/config:/app/config:ro

Change a file on the host, the container sees it immediately. Some services pick up changes on the next request; others need a restart. Either way, there's no build step for routine changes.

When I Actually Rebuild

For structural changes — new dependencies, Dockerfile modifications, new service definitions — I rebuild the image:

.DEPLOYMENT/scripts/compose.sh aitheros up -d --build aither-genesis

First build takes 5-10 minutes (base image). After that, ~10 seconds per service thanks to a shared multi-target Dockerfile with BuildKit layer caching.

The Tunnel: Cloudflare Does the Heavy Lifting

The website at demo.aitherium.com is served by AitherVeil (Next.js 16) running inside Docker on my desktop. It reaches the internet through a single Cloudflare Tunnel:

aitheros-tunnel:
  image: cloudflare/cloudflared:latest
  # Dynamic: 'tunnel run' if token exists, else quick tunnel to Veil
  command: tunnel ${CLOUDFLARE_TUNNEL_TOKEN:+run} ${CLOUDFLARE_TUNNEL_TOKEN:---url http://aither-veil:3000}
  environment:
    TUNNEL_TOKEN: "${CLOUDFLARE_TUNNEL_TOKEN:-}"
  depends_on:
    aither-veil:
      condition: service_healthy

That's it. Cloudflare terminates TLS, handles DDoS protection, and routes traffic through their edge network to my machine. No static IP needed. No port forwarding. No nginx config on a VPS.

The Subdomain Map

All of these resolve to containers on my desktop:

Subdomain	What It Hits
`demo.aitherium.com`	AitherVeil dashboard (port 3000)
`gateway.aitherium.com`	API gateway (Cloudflare Worker → port 8185)
`mcp.aitherium.com`	MCP tool gateway (bypasses WAF bot challenges)
`irc.aitherium.com`	WebSocket chat relay

The main aitherium.com runs on GitHub Pages — completely static, completely separate. If my machine is off, the marketing site stays up.

No AWS. Here's What I Use Instead.

Let me map the typical cloud services to what I'm actually running:

AWS Service	My Equivalent	Cost
EC2 / ECS	Docker Desktop on my workstation	$0
Route 53	Cloudflare DNS (free tier)	$0
ALB / CloudFront	Cloudflare Tunnel + Workers	$0 (free tier)
RDS	PostgreSQL in Docker	$0
ElastiCache	Redis in Docker	$0
SageMaker	vLLM + Ollama on local GPU	$0
Secrets Manager	AitherSecrets (self-hosted vault, port 8111)	$0
CloudWatch	Chronicle (self-hosted logging, port 8121)	$0
S3	Docker named volumes (25 of them)	$0
API Gateway	Cloudflare Worker + AitherOS Gateway	$0
Cognito	AitherIdentity + Cloudflare Zero Trust SSO	$0
SQS / EventBridge	FluxEmitter (hybrid event bus — in-process + HTTP cross-container)	$0

Total monthly cloud bill: $0.

The Three-Wave Boot Sequence

When I power on and start the stack, services come up in waves:

Wave 0 (immediate): Pulse (heartbeat), Chronicle (logs), Secrets (vault), Redis, PostgreSQL, vLLM. These have zero dependencies — they just start.

Wave 1 (~25 seconds): Core platform — Genesis, AitherVeil, Node, Watch, SecurityCore. These wait for Pulse and Chronicle to report healthy.

Wave 2 (~55 seconds): Everything else — agents, cognitive services, automation. These wait for Genesis to report healthy.

The whole stack is up in about a minute. Each service has health checks that compose monitors — no service starts before its dependencies are confirmed alive.

npm start          # Core stack (~25 containers)
npm run start:full # Everything (~65 containers)

Ring-Based Deployment (When It Matters)

For actual releases — not my daily hack-and-restart cycle — there's a proper promotion pipeline:

develop → staging → main
 Ring 0    Ring 1    Ring 2
  local   demo.aitherium.com   production release

Ring 0 (dev): My desktop. Auto-deploys on every push to develop. This is where the docker cp workflow lives.
Ring 1 (staging): Same machine, but gated. Requires passing health checks, unit tests (pytest + Pester), and lint before promotion. Serves demo.aitherium.com.
Ring 2 (prod): Tagged release. Images pushed to ghcr.io/aitherium. Requires 15 minutes of staging stability + smoke tests + manual approval.

npm run promote:staging  # dev → staging
npm run promote:prod     # staging → prod (requires approval)

Most days I live in Ring 0. The website people see is Ring 1. Ring 2 is for milestones.

Inference: Two GPUs Worth of Models on One Card

The RTX 5090's 32GB VRAM runs a dual-model setup:

vLLM primary (port 8120): Nemotron-Orchestrator-8B — handles chat, tool routing, and agent orchestration. 40% VRAM.
vLLM reasoning (port 8176): DeepSeek-R1:14b — always-on reasoning model for complex tasks. 45% VRAM.
Ollama (CPU, port 11434): llama3.2:3b — fast reflex model for simple queries. Zero VRAM.

What Goes Wrong (And How It Recovers)

This setup is not bulletproof. Things break regularly:

Tunnel drops: When I restart AitherVeil, the Cloudflare tunnel disconnects for 3-5 seconds. The Cloudflare Worker catches this and serves a maintenance page. Not ideal, not catastrophic.

None of this is unsolvable. It's just the tax you pay for running a distributed system on one machine. The tradeoff is total control and zero monthly cost.

The Daily Workflow

Here's what a typical development day looks like:

Morning: npm start — core stack comes up in ~60 seconds.
Code: Edit in VS Code with Claude Code + Copilot + Serena (semantic code intelligence) + AitherNode MCP (250+ tools from the OS itself).
Deploy: docker cp changed files into Genesis, restart. Or just save and let bind mounts propagate.
Test: Hit the live website. Run pytest. Check Chronicle logs. Fix, repeat.
Ship: When it's stable, npm run promote:staging pushes to the demo site.
Write: Blog posts like this one get written in the Veil blog editor at /blog/editor, published through the same pipeline.

I don't have a staging server. I don't have a CI/CD cluster. I don't have a DevOps team. The machine I'm typing on is the entire platform — development, staging, and production.

Why This Works (For Now)

This setup works because of three things:

Cloudflare Tunnel eliminates the networking problem. No static IP, no port forwarding, no firewall rules. Encrypted tunnel from my desktop to Cloudflare's edge. Free.
Docker Compose is underrated. 5233+ lines of compose YAML replaces what most companies use Kubernetes for. Named volumes, health checks, dependency ordering, resource limits, GPU passthrough, profile-based selective startup — it all works.
The GPU changes the economics. Local inference with a 32GB VRAM card means I don't need cloud GPU instances. The models run on my desk. The latency is lower than API calls. The cost is zero per token.

The Point

I deploy code to a live website by copying files into containers and restarting them. The website drops for four seconds. It comes back. I keep coding.

No AWS bill. No Terraform state files. No Kubernetes manifests. Just code, containers, and a tunnel to the internet.

That's the setup. It's messy, it's opinionated, and it works every single day.

Enjoyed this post?

All posts Try AitherOS

Deploy on the Fly, No AWS Required: How I Ship Code to a Live AI Platform From My Desktop

The Machine

The Architecture in 30 Seconds

How Code Gets to Production

The docker cp Pattern

When I Actually Rebuild

The Tunnel: Cloudflare Does the Heavy Lifting

The Subdomain Map

No AWS. Here's What I Use Instead.

The Three-Wave Boot Sequence

Ring-Based Deployment (When It Matters)

Inference: Two GPUs Worth of Models on One Card

What Goes Wrong (And How It Recovers)

The Daily Workflow

Why This Works (For Now)

The Point

Deploy on the Fly, No AWS Required: How I Ship Code to a Live AI Platform From My Desktop

The Machine

The Architecture in 30 Seconds

How Code Gets to Production

The docker cp Pattern

When I Actually Rebuild

The Tunnel: Cloudflare Does the Heavy Lifting

The Subdomain Map

No AWS. Here's What I Use Instead.

The Three-Wave Boot Sequence

Ring-Based Deployment (When It Matters)

Inference: Two GPUs Worth of Models on One Card

What Goes Wrong (And How It Recovers)

The Daily Workflow

Why This Works (For Now)

The Point

The `docker cp` Pattern

The `docker cp` Pattern