Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Monitoring services…

•Connecting to services…

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringsecurityinfrastructurevpnwireguard

Zero-Disk VPN: How We Run WireGuard Inside Docker With No Key Files

Name: AitherOS
Author: Aitherium

March 9, 202612 min readAitherium

There's a moment in every infrastructure project where you realize you've been storing secrets wrong.

For us it was staring at a Docker volume mount containing server_private.key — a plaintext WireGuard private key sitting on disk, owned by root, readable by anyone who can docker exec into the container. We had an entire secrets vault with encryption at rest, access control, audit logging, and automatic rotation. And next to it, a 44-byte file containing the most important key in the network stack.

So we fixed it. AitherTunnel now runs a full WireGuard VPN with zero cryptographic material on disk. Every key, every peer config, every piece of state lives in AitherSecrets vault. The /data/tunnel/ volume is empty. ls returns nothing.

Here's how that works.

Why WireGuard in Docker Is Annoying

WireGuard is a kernel module. It operates at Layer 3 of the network stack, creating virtual network interfaces that encrypt traffic with ChaCha20-Poly1305 using the Noise protocol framework. It's fast, it's simple, and it has a tiny attack surface compared to OpenVPN or IPSec.

It's also designed to be configured by root.

Creating a WireGuard interface requires ip link add type wireguard — a kernel netlink operation that needs CAP_NET_ADMIN. Setting the listen port and private key requires wg set — another privileged operation. Configuring IP forwarding and NAT masquerade requires iptables — yep, privileged.

Docker containers don't run as root by default. Our containers definitely don't — every AitherOS service runs as UID 1000 (aither), with the entrypoint using gosu to drop privileges from the initial root. This is correct security practice for 96 out of 203 services. For a VPN service, it means nothing works.

The fix is a two-part bypass:

# Docker Compose service definition
aither-tunnel-service:
  user: root
  cap_add:
    - NET_ADMIN
    - SYS_MODULE
  sysctls:
    - net.ipv4.ip_forward=1

The container runs as root with an environment variable that tells the entrypoint to skip the usual privilege drop. CAP_NET_ADMIN grants network interface management. SYS_MODULE allows WireGuard kernel module loading if needed. And net.ipv4.ip_forward=1 enables IP forwarding at the kernel level so the VPN can route traffic.

This is the same pattern we use for GPU services — some workloads legitimately need root, and pretending otherwise just leads to broken workarounds.

The Bootstrap Sequence

When AitherTunnel starts, before it opens the FastAPI listener, it runs through a WireGuard bootstrap sequence:

Step 1: Hydrate Keys from Vault

vault_key = await _vault_get("TUNNEL_WG_SERVER_PRIVATE_KEY")
if vault_key:
    _wg_server_private = vault_key
    # Derive public key from the private key
    proc = await asyncio.create_subprocess_exec(
        "wg", "pubkey",
        stdin=asyncio.subprocess.PIPE,
        stdout=asyncio.subprocess.PIPE
    )
    stdout, _ = await proc.communicate(_wg_server_private.encode())
    _wg_server_public = stdout.decode().strip()

On first boot, neither key exists in the vault. So we generate them:

proc = await asyncio.create_subprocess_exec(
    "wg", "genkey",
    stdout=asyncio.subprocess.PIPE
)
stdout, _ = await proc.communicate()
_wg_server_private = stdout.decode().strip()

# Store both keys in the vault
await _vault_store("TUNNEL_WG_SERVER_PRIVATE_KEY", _wg_server_private)
await _vault_store("TUNNEL_WG_SERVER_PUBLIC_KEY", _wg_server_public)

On every subsequent boot, the keys come from the vault. Same keypair, stable public key, no file I/O. The container can be destroyed and recreated without losing its identity.

Step 2: Create the Interface

async def _wg_bootstrap_interface():
    iface = os.getenv("AITHER_WG_INTERFACE", "wg-aither0")
    port = os.getenv("AITHER_WG_PORT", "51820")
    subnet = os.getenv("AITHER_WG_SUBNET", "10.66.0.0/24")
    server_ip = subnet.rsplit(".", 1)[0] + ".1"

    # Create the WireGuard interface
    await _run("ip", "link", "add", "dev", iface, "type", "wireguard")

    # Write private key to a tmpfs pipe (never touches disk)
    proc = await asyncio.create_subprocess_exec(
        "wg", "set", iface,
        "listen-port", port,
        "private-key", "/dev/stdin",
        stdin=asyncio.subprocess.PIPE
    )
    await proc.communicate(_wg_server_private.encode())

    # Assign IP and bring up
    await _run("ip", "addr", "add", f"{server_ip}/{prefix}", "dev", iface)
    await _run("ip", "link", "set", iface, "up")

    # NAT masquerade for VPN clients
    await _run("iptables", "-t", "nat", "-A", "POSTROUTING",
               "-s", subnet, "-o", "eth0", "-j", "MASQUERADE")
    await _run("iptables", "-A", "FORWARD",
               "-i", iface, "-j", "ACCEPT")
    await _run("iptables", "-A", "FORWARD",
               "-o", iface, "-m", "state",
               "--state", "RELATED,ESTABLISHED", "-j", "ACCEPT")

Notice the private key is piped through /dev/stdin. It never exists as a file, not even temporarily. The kernel receives it through a file descriptor, configures the interface, and the key exists only in kernel memory from that point forward.

The result is a WireGuard interface wg-aither0 listening on UDP 51820, with server IP 10.66.0.1/24, ready to accept peer connections:

3: wg-aither0: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1420 qdisc noqueue state UNKNOWN
    link/none
    inet 10.66.0.1/24 scope global wg-aither0
       valid_lft forever preferred_lft forever

Step 3: Restore Peers

If there are existing peers in the vault, they're restored during startup:

peers_json = await _vault_get("TUNNEL_WG_PEERS")
if peers_json:
    peers = json.loads(peers_json)
    for peer_id, peer in peers.items():
        if peer.get("active"):
            await _apply_wg_peer(peer)

Each peer gets added to the interface with wg set ... peer <pubkey> allowed-ips <ip>/32. The VPN is fully operational before the HTTP server starts accepting requests.

The Vault Layer: Base64 All The Things

The naive approach to storing WireGuard keys in a secrets vault is:

await secrets_client.store("TUNNEL_WG_SERVER_PRIVATE_KEY", private_key, "private_key")

This doesn't work. WireGuard keys are base64-encoded 32-byte Curve25519 keys. They contain +, /, and = characters. Our secrets API — like most secrets APIs — treats values as strings that pass through URL encoding, JSON serialization, and database storage. Somewhere in that pipeline, + becomes a space, / becomes a path separator, and = gets stripped as padding.

The result: store returns 200 OK. Get returns 404 Not Found. The key was corrupted on write and doesn't match on read.

The fix is double encoding:

import base64

def _vault_encode(raw: str) -> str:
    """Base64-wrap a value so +/= survive API round-trip."""
    return base64.b64encode(raw.encode()).decode()

def _vault_decode(stored: str) -> str:
    """Unwrap a base64-encoded vault value."""
    return base64.b64decode(stored.encode()).decode()

Every write goes through _vault_encode(). Every read goes through _vault_decode(). The stored value in the vault is a base64 string containing a base64 string — double-encoded, but guaranteed to round-trip cleanly through any API that handles alphanumeric strings.

Is this elegant? No. Does it work on every secrets backend we've tested? Yes. We'll take it.

Hot Reload: syncconf, Not setconf

When a peer is created or revoked, we need to update the WireGuard interface. The obvious approach is wg setconf, which replaces the entire configuration. The problem is that setconf drops all existing peer state — active sessions, handshake timers, transfer counters. Every connected client gets a brief interruption while WireGuard re-establishes sessions.

wg syncconf is the answer. It diffs the current config against the new one and applies only the changes:

async def _wg_sync_config():
    """Zero-downtime peer sync via wg syncconf."""
    lines = [f"[Interface]\nListenPort = {port}\nPrivateKey = {_wg_server_private}\n"]
    for peer in _wg_peers.values():
        if peer.get("active") and peer.get("public_key"):
            lines.append(
                f"\n[Peer]\n"
                f"PublicKey = {peer['public_key']}\n"
                f"AllowedIPs = {peer['assigned_ip']}/32\n"
            )
    config = "\n".join(lines)

    # Write to a temp file (WireGuard requires a file path for syncconf)
    tmp = f"/tmp/.wg_sync_{iface}.conf"
    async with aiofiles.open(tmp, "w") as f:
        await f.write(config)
    try:
        await _run("wg", "syncconf", iface, tmp)
    finally:
        os.unlink(tmp)

A new peer is added without any existing peer noticing. A revoked peer is removed instantly — their next packet gets no response, and after the handshake timer expires (typically 2 minutes), the connection is dead.

The temp file is a concession — wg syncconf requires a file path, unlike wg set which can take stdin. It exists for milliseconds and is deleted in a finally block. The private key is in there briefly, but it's in /tmp which is a tmpfs in our container — RAM, not disk.

Peer Creation: Show Once, Store Never

When a user creates a VPN peer through the portal or API, here's what happens:

Generate a keypair: wg genkey → private key. echo $privkey | wg pubkey → public key.
Allocate the next available IP from the 10.66.0.0/24 subnet.
Add the peer to the kernel interface: wg syncconf with the new peer's public key and allowed IP.
Build a .conf file for the client:

[Interface]
PrivateKey = <client_private_key>
Address = 10.66.0.7/32
DNS = 1.1.1.1, 8.8.8.8

[Peer]
PublicKey = <server_public_key>
Endpoint = tunnel.aitherium.com:51820
AllowedIPs = 10.66.0.0/24
PersistentKeepalive = 25

Return the config to the user. Display it once in the browser with a copy button and download link.
Discard the client private key. The server stores only: public key, assigned IP, creation time, and the user's email. The private key exists only in the HTTP response and the user's WireGuard client.

If the user loses their config, they revoke the peer and create a new one. There's no "retrieve my config" endpoint because there's nothing to retrieve. This isn't a limitation — it's the security model. A compromised server can't leak client private keys that it doesn't have.

Background Stats: What You Can See

A background task runs every 30 seconds, reading WireGuard interface stats:

async def _wg_collect_stats():
    proc = await asyncio.create_subprocess_exec(
        "wg", "show", iface, "dump",
        stdout=asyncio.subprocess.PIPE
    )
    stdout, _ = await proc.communicate()
    for line in stdout.decode().strip().split("\n")[1:]:  # Skip interface line
        parts = line.split("\t")
        pubkey = parts[0]
        latest_handshake = int(parts[4]) if parts[4] != "0" else 0
        rx_bytes = int(parts[5])
        tx_bytes = int(parts[6])
        # Update peer stats in memory

The portal shows this as real-time peer status: last handshake time, data transferred, whether the peer is online (handshake within the last 3 minutes). It's the same data you'd see running wg show on the command line, pulled into the web UI.

The Dual-Layer Architecture

AitherTunnel operates two completely independent security layers:

Layer 1: Cloudflare Tunnel (L7) — A cloudflared sidecar container maintains an outbound-only encrypted tunnel to Cloudflare's edge. Traffic to tunnel.aitherium.com hits Cloudflare first, passes through Cloudflare Access (SSO + JWT), and is proxied back through the tunnel to the container. No inbound ports are opened for HTTPS. This handles the web portal, SSH terminal, API endpoints — everything HTTP.

Layer 2: WireGuard (L3) — The WireGuard interface operates at the network layer, providing IP-level tunneling on UDP 51820. This is a direct connection — traffic flows from the client's WireGuard app to the server's UDP port. No Cloudflare in the middle. This is intentional: VPN traffic needs low latency and high throughput. Adding a proxy layer would defeat the purpose.

The two layers complement each other:

Need	Layer
Manage VPN peers	Cloudflare (HTTPS portal with SSO)
Persistent network access	WireGuard (kernel VPN)
Browser terminal	Cloudflare (WebSocket through tunnel)
Hit internal APIs	Either — VPN gives you the IP, tunnel gives you the URL
Quick access from phone	Cloudflare (just open the browser)
Route all traffic through home	WireGuard (full tunnel mode)

A developer might use Cloudflare for the terminal and WireGuard for persistent access to internal services. An operator might only use WireGuard to hit API endpoints from their laptop. The admin uses both. Each layer has its own auth: Cloudflare Access gates the portal, WireGuard cryptographic identity gates the VPN.

What We Got Wrong Along the Way

CAP_NET_ADMIN doesn't work the way you think. Docker's cap_add: [NET_ADMIN] adds the capability to the container's bounding set. But if the process runs as a non-root user, the effective capability set is empty. We had the capability declared in compose, confirmed via docker inspect, and the container still couldn't create interfaces. The fix wasn't adding more capabilities — it was running as root. Two hours of debugging for a one-line fix.

docker restart doesn't reapply capabilities. If you docker restart a container, it reuses the existing process namespace. Capabilities that were lost (or never applied due to user mismatch) stay lost. You need docker compose up --force-recreate to get a fresh namespace with proper capability application. This is undocumented and cost us another hour.

Secrets APIs corrupt binary-ish data silently. We stored a WireGuard key, got 200 back, and then got 404 on retrieval. The key contained + and = characters that were mangled during storage. The API didn't error — it successfully stored a corrupted value. The fix was base64-wrapping everything, which is ugly but universal. If your secrets API doesn't round-trip aB+c/D== correctly, wrap it.

The gosu entrypoint pattern needs escape hatches. Our entrypoint unconditionally drops to non-root via gosu. When we added user: root to compose, the Dockerfile's USER aither was overridden, but the entrypoint's privilege-drop logic still fired and dropped us right back. The solution is environment-variable gates that let specific services opt out of the privilege drop.

File-based fallback is a vulnerability, not resilience. Our first version wrote keys to disk as a fallback when the vault was unreachable. "Defense in depth," we told ourselves. Except now there's a plaintext private key in a Docker volume that persists across container recreations, is visible to anyone who can mount the volume, and never gets rotated. We ripped it all out. If the vault is down, the VPN doesn't start. That's the correct failure mode — fail closed, not fail to plaintext.

The Result

$ docker exec aitheros-tunnel-service wg show wg-aither0
interface: wg-aither0
  public key: snwC5sWi3eQNGbzXpS/K+07Tv58xYocw4+E0i96YPGY=
  private key: (hidden)
  listening port: 51820

$ docker exec aitheros-tunnel-service ls /data/tunnel/
# (empty)

$ curl -s http://localhost/health | python3 -m json.tool | grep status
"status": "healthy"

A running VPN with an empty data directory. Every secret in the vault. Every peer managed through an SSO-protected portal. The kernel does the crypto, the vault holds the state, and the disk holds nothing.

That's the whole thing. A WireGuard VPN that treats secrets the way secrets should be treated — as things that belong in a vault, not in files.

Enjoyed this post?

All posts Try AitherOS