Deploy Edge Nodes With One Command: Comet's Compose-Over-SSH Pipeline
We had a working edge deployment file — .DEPLOYMENT/compose/docker-compose.edge.yml — that deploys cloudflared, Ollama, and AitherNode in standalone mode. Copy it to a machine, fill in a .env, run docker compose up -d, done. Tunnel HA happens automatically because Cloudflare treats multiple connectors with the same token as redundant paths.
The problem was getting it there. The deployment was manual. Copy files over, SSH in, set environment variables, start the stack, check health. And our deployment service — AitherComet — had all the primitives for remote deployment (SSH exec, SCP, Docker orchestration) but the edge node path was broken. It queried the Mesh registry for target nodes, found nothing (the remote machine isn't registered yet — that's the whole point), fell back to localhost, and ran Docker commands on the primary machine instead of the target.
So we fixed the pipeline and made it a single command.
What Changed
Three things were wrong. All three are fixed now.
Bug 1: SSH Targets Fell Through to Localhost
When you called deploy_edge_node("192.168.1.50"), the CometClient correctly packed the SSH connection info (host, port, user, key path) into node_selector and sent it to the engine. But CometEngine.deploy() passed that selector to get_mesh_nodes(), which queries the AitherMesh registry and filters by node labels. SSH connection info isn't a label. Nothing matched. The fallback kicked in:
nodes = await self.get_mesh_nodes(spec.node_selector)
if not nodes:
nodes = [{"id": "local", "address": "localhost"}] # <-- always hit this
The fix: detect when node_selector contains a host key and treat it as a direct SSH target instead of a mesh query:
ssh_target = None
if spec.node_selector and "host" in spec.node_selector:
ssh_target = {
"id": f"remote-{spec.node_selector['host']}",
"address": spec.node_selector["host"],
"ssh_port": int(spec.node_selector.get("ssh_port", 22)),
"ssh_user": spec.node_selector.get("ssh_user", "root"),
"ssh_key_path": spec.node_selector.get("ssh_key_path", ""),
}
Bug 2: Individual Docker Containers Instead of Compose Stack
The old path ran individual docker run commands for each container. This meant manually wiring up networks, volumes, health checks, and GPU device reservations — all of which .DEPLOYMENT/compose/docker-compose.edge.yml already defines correctly.
The new _deploy_edge_stack() method does compose-over-SSH:
- Locates
.DEPLOYMENT/compose/docker-compose.edge.ymlfrom the repo root - Generates a
.envfile with the right variables (tunnel token, identity, LLM URL) - SCPs both files to
/opt/aitheros-edge/on the remote machine - Runs
docker compose up -dvia SSH - Health-checks AitherNode at
localhost:8090/health
The remote machine just needs Docker, Docker Compose, and SSH access. GPU drivers if you want Ollama inference.
Bug 3: remote_llm_url Was Silently Dropped
The MCP tool and Genesis request model both had a remote_llm_url field for configuring LLM fallback routing — if the edge Ollama doesn't have a model, route to the primary cluster's MicroScheduler. But the Genesis router never passed it through to CometClient. One-line fix, but it meant edge nodes couldn't fall back to the primary for inference.
How to Use It
Via MCP Tool (Claude Code, Cursor, etc.)
deploy_edge_node("192.168.1.50", tunnel_token="ey...", identity="atlas")
Full parameters:
deploy_edge_node(
host="10.0.0.5",
ssh_port=22,
ssh_user="ubuntu",
ssh_key_path="/home/user/.ssh/edge_key",
identity="genesis",
tunnel_token="", # Auto-resolved from primary .env if empty
services="node,ollama",
remote_llm_url="http://primary-host:8150"
)
Via Genesis API
curl -X POST http://localhost:8001/deploy/edge-node \
-H "Content-Type: application/json" \
-d '{
"host": "192.168.1.50",
"ssh_user": "ubuntu",
"ssh_key_path": "/home/user/.ssh/edge_key",
"tunnel_token": "",
"identity": "genesis",
"remote_llm_url": "http://primary:8150"
}'
Via CometClient (Python)
from services.mesh.AitherComet import get_comet_client
comet = get_comet_client()
result = await comet.deploy_edge_node(
host="10.0.0.5",
ssh_user="ubuntu",
ssh_key_path="/path/to/key",
tunnel_token="", # auto-resolves from env
remote_llm_url="http://primary:8150",
)
Tunnel Token Auto-Resolution
If you don't pass a tunnel_token, Comet tries to resolve it automatically:
- Check
spec.env_vars["CLOUDFLARE_TUNNEL_TOKEN"](explicit) - Check
os.environ["CLOUDFLARE_TUNNEL_TOKEN"](process env) - Read the repo root
.envfile and parse theCLOUDFLARE_TUNNEL_TOKEN=line
This means if you're running Comet on the primary machine (which already has the token in .env), edge deployments automatically get the same token — creating multi-connector HA without you needing to copy it manually.
What Gets Deployed
The edge stack is defined in .DEPLOYMENT/compose/docker-compose.edge.yml at the repo root. It deploys three services:
| Service | Image | Purpose |
|---|---|---|
| cloudflared | cloudflare/cloudflared:latest | Tunnel connector — same token = automatic HA with primary |
| ollama | ollama/ollama:latest | Local LLM inference with GPU passthrough |
| aithernode | Built from AitherOS/ | Standalone MCP server — 30+ tools, filesystem, git, vision, generation |
Optional: add --profile dashboard for AitherVeil web UI failover.
Pre-Built Image Support
If the remote machine doesn't have the AitherOS source tree (no build context), set the AITHER_EDGE_IMAGE environment variable to a pre-built image URL. Comet generates a compose override that replaces the build: directive:
export AITHER_EDGE_IMAGE=ghcr.io/your-org/standalone-agent:latest
How Tunnel HA Works
This is a Cloudflare feature, not something we built. When two cloudflared processes run with the same tunnel token on different machines, Cloudflare's edge network sees them as two "connectors" for the same tunnel. Traffic is automatically:
- Load balanced across healthy connectors
- Failed over when one connector goes down
- Routed geographically to the nearest healthy connector
No DNS changes. No config changes. No load balancer to manage. You just run the same token on another machine.
Primary Machine: cloudflared (token X) ──┐
├── CF Tunnel ── CF Edge ── Users
Secondary Machine: cloudflared (token X) ──┘
The edge.aitherium.com route in tunnel-routes.yaml points to the edge AitherNode. When both connectors are healthy, Cloudflare handles routing. When the primary goes down, the edge takes over transparently.
Remote Machine Requirements
Before deploying, make sure the target machine has:
- Docker (20.10+) and Docker Compose (v2)
- SSH server accessible from the primary machine
- NVIDIA drivers (if you want GPU inference via Ollama)
- Disk space: ~10GB for images, more for models
The deployment creates /opt/aitheros-edge/ on the remote machine with the compose file and generated .env. All persistent data lives in Docker volumes (ollama_data, node_data).
Checking Deployment Status
After deploying, track progress via the deployment ID returned by the API:
get_deployment_status("comet-edge-node-1711446000000-1")
Or check the remote machine directly:
ssh user@edge-host "docker ps --filter label=com.docker.compose.project=aitheros-edge"
Files Changed
| File | What |
|---|---|
AitherOS/services/mesh/AitherComet.py | SSH target detection, _scp_file(), _deploy_edge_stack(), _resolve_tunnel_token(), remote_llm_url support |
AitherOS/apps/AitherGenesis/routers/deploy.py | Pass remote_llm_url through to CometClient |
.DEPLOYMENT/compose/docker-compose.edge.yml | Reference — the canonical edge compose file (unchanged) |
AitherOS/apps/AitherNode/tools/mcp/mcp_infrastructure.py | MCP tool already had all params (no changes needed) |