Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Connecting to services…

•

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

talkai-tinkerersagentslocal-inferenceaitheros

AI Company Brains and Personal Agents

June 2, 20265 min readAlex Garganté

AI Company Brains and Personal Agents

AI Tinkerers OC — June 2, 2026 | 10-min demo

What This Is

An operating system for AI agents running entirely on local hardware — zero cloud LLM spend. 196 microservices, 43 agents, 147 MCP tools.

Live right now:

portal.aitherium.com — agent fleet hub
chelle.aitherium.com — photography business AI
garg.aitherium.com — government consulting AI
irc.aitherium.com — live chat (join #tinkerers!)
Discord: Vibeful — AI dev community

RTX 5090 + DGX Spark under my desk. Demoing through a Cloudflare tunnel back to my house.

The Core Insight: Two-Tier Routing

One model handles everything. Reasoning is a tool, not a tier.

User Request
    |
    v
[Nemotron-Orchestrator-8B] ← always resident, 14GB with TQ4
    |
    |--- 90% of requests → direct response
    |
    |--- "I need to think harder" → calls deep_reasoning tool
    |                                    |
    |                                    v
    |                             [DeepSeek-R1:14b]
    |                                    |
    |                                    v
    |--- ← reasoning result ← ─────────┘
    |
    v
Response

Why it works:

8B model handles tool calling, routing, orchestration, and most responses
Reasoning model only fires when the orchestrator decides it needs deep thinking
Both run on one RTX 5090 via vLLM continuous batching
4-8 agents can work simultaneously on one GPU

AitherKVCache

4-bit KV cache compression → 3.8x memory savings → 6x concurrent requests at 40K context.

pip install aither-kvcache
# Drop into any vLLM instance:
# --kv-cache-dtype tq-t4nc

Claude Code as Command & Control

Claude Code is my brain. AitherOS is my local AI infrastructure.

147 MCP tools — all routed to local compute:

Code search (CodeGraph + Repowise)
Agent dispatch (forge_subagent)
Memory graph (remember, recall, query_memory)
Git, filesystem, terminal
LLM inference (local vLLM)

Claude Code → MCP → AitherNode → local services → vLLM → response

The key: Cloud intelligence for planning, local hardware for execution.

Managing the Infrastructure

AitherZero: 170+ PowerShell Scripts

Numbered automation scripts organize the entire lifecycle:

00-prerequisites/    # Environment setup
10-initialization/   # First-time config
20-networking/       # Tunnel, DNS, certificates
30-services/         # Docker compose orchestration
40-deployment/       # Build, push, deploy
50-maintenance/      # Health checks, cleanup
60-monitoring/       # Grafana, alerts
70-security/         # Vault rotation, audit
80-testing/          # Pester, pytest, integration
90-cleanup/          # Teardown, reset

Docker Compose Profiles

Graduated startup — don't boot 90 containers for chat:

# Raw LLM chat (~20 containers)
docker compose --profile chat-minimal up -d

# Full personality + memory (~29 containers)
docker compose --profile chat-full up -d

# + Agent tool use (~31 containers)
docker compose --profile chat-agents up -d

API Contracts as Boundaries

Every service follows the same pattern:

import services._bootstrap
from lib.core.AitherIntegration import AitherService, get_port
from lib.core.AitherEvents import setup_lifecycle

aither = AitherService("MyService", port=get_port("MyService"))
app = aither.app
setup_lifecycle(app, "MyService", port=PORT, register_identity=True)

config/services.yaml is the single source of truth for all 196 services.

What You Can Take Home

Install

pip install aithershell aither-adk aither-kvcache

aithershell — Standalone CLI

Works with Ollama, vLLM, or OpenAI directly (no AitherOS required):

aither --init                    # Configure your LLM backend
aither "build me an agent"       # Instant chat

# Or set backend explicitly:
export AITHER_LLM_BACKEND=ollama
export AITHER_LLM_URL=http://localhost:11434
aither "hello"

aither-adk — Agent Development Kit

aither init myagent              # Scaffold agent project
cd myagent
aither run                       # Start agent server

# Auto-detects: Ollama → vLLM → OpenAI
# Generated config.yaml defaults to llm_backend: auto

aither-kvcache — KV Cache Compression

Drop into any vLLM instance for 3.8x VRAM savings:

pip install aither-kvcache
# Add to vLLM args: --kv-cache-dtype tq-t4nc

Hardware Scaling

Setup	What You Get
RTX 5090 alone	Nemotron-8B orchestrator + DeepSeek-R1-14B reasoning (4-bit)
5090 + Ollama on CPU	Above + CPU fallback for lighter queries
5090 + DGX Spark	Full AitherOS: 27B primary + 14B reasoning + embeddings

Pitfalls

Don't use one giant model — small orchestrator + reasoning-as-tool scales better
Don't skip KV cache compression — it's free 3.8x VRAM savings
MCP tools are the composability layer — build there, not from scratch
Docker compose profiles — don't boot everything for development
API contracts — services.yaml + AitherService pattern keeps 196 services manageable

AI Company Brains and Personal Agents

June 2, 20265 min readAlex Garganté

AI Company Brains and Personal Agents

AI Tinkerers OC — June 2, 2026 | 10-min demo

What This Is

An operating system for AI agents running entirely on local hardware — zero cloud LLM spend. 196 microservices, 43 agents, 147 MCP tools.

Live right now:

portal.aitherium.com — agent fleet hub
chelle.aitherium.com — photography business AI
garg.aitherium.com — government consulting AI
irc.aitherium.com — live chat (join #tinkerers!)
Discord: Vibeful — AI dev community

RTX 5090 + DGX Spark under my desk. Demoing through a Cloudflare tunnel back to my house.

The Core Insight: Two-Tier Routing

One model handles everything. Reasoning is a tool, not a tier.

User Request
    |
    v
[Nemotron-Orchestrator-8B] ← always resident, 14GB with TQ4
    |
    |--- 90% of requests → direct response
    |
    |--- "I need to think harder" → calls deep_reasoning tool
    |                                    |
    |                                    v
    |                             [DeepSeek-R1:14b]
    |                                    |
    |                                    v
    |--- ← reasoning result ← ─────────┘
    |
    v
Response

Why it works:

8B model handles tool calling, routing, orchestration, and most responses
Reasoning model only fires when the orchestrator decides it needs deep thinking
Both run on one RTX 5090 via vLLM continuous batching
4-8 agents can work simultaneously on one GPU

AitherKVCache

4-bit KV cache compression → 3.8x memory savings → 6x concurrent requests at 40K context.

pip install aither-kvcache
# Drop into any vLLM instance:
# --kv-cache-dtype tq-t4nc

Claude Code as Command & Control

Claude Code is my brain. AitherOS is my local AI infrastructure.

147 MCP tools — all routed to local compute:

Code search (CodeGraph + Repowise)
Agent dispatch (forge_subagent)
Memory graph (remember, recall, query_memory)
Git, filesystem, terminal
LLM inference (local vLLM)

Claude Code → MCP → AitherNode → local services → vLLM → response

The key: Cloud intelligence for planning, local hardware for execution.

Managing the Infrastructure

AitherZero: 170+ PowerShell Scripts

Numbered automation scripts organize the entire lifecycle:

00-prerequisites/    # Environment setup
10-initialization/   # First-time config
20-networking/       # Tunnel, DNS, certificates
30-services/         # Docker compose orchestration
40-deployment/       # Build, push, deploy
50-maintenance/      # Health checks, cleanup
60-monitoring/       # Grafana, alerts
70-security/         # Vault rotation, audit
80-testing/          # Pester, pytest, integration
90-cleanup/          # Teardown, reset

Docker Compose Profiles

Graduated startup — don't boot 90 containers for chat:

# Raw LLM chat (~20 containers)
docker compose --profile chat-minimal up -d

# Full personality + memory (~29 containers)
docker compose --profile chat-full up -d

# + Agent tool use (~31 containers)
docker compose --profile chat-agents up -d

API Contracts as Boundaries

Every service follows the same pattern:

import services._bootstrap
from lib.core.AitherIntegration import AitherService, get_port
from lib.core.AitherEvents import setup_lifecycle

aither = AitherService("MyService", port=get_port("MyService"))
app = aither.app
setup_lifecycle(app, "MyService", port=PORT, register_identity=True)

config/services.yaml is the single source of truth for all 196 services.

What You Can Take Home

Install

pip install aithershell aither-adk aither-kvcache

aithershell — Standalone CLI

Works with Ollama, vLLM, or OpenAI directly (no AitherOS required):

aither --init                    # Configure your LLM backend
aither "build me an agent"       # Instant chat

# Or set backend explicitly:
export AITHER_LLM_BACKEND=ollama
export AITHER_LLM_URL=http://localhost:11434
aither "hello"

aither-adk — Agent Development Kit

aither init myagent              # Scaffold agent project
cd myagent
aither run                       # Start agent server

# Auto-detects: Ollama → vLLM → OpenAI
# Generated config.yaml defaults to llm_backend: auto

aither-kvcache — KV Cache Compression

Drop into any vLLM instance for 3.8x VRAM savings:

pip install aither-kvcache
# Add to vLLM args: --kv-cache-dtype tq-t4nc

Hardware Scaling

Setup	What You Get
RTX 5090 alone	Nemotron-8B orchestrator + DeepSeek-R1-14B reasoning (4-bit)
5090 + Ollama on CPU	Above + CPU fallback for lighter queries
5090 + DGX Spark	Full AitherOS: 27B primary + 14B reasoning + embeddings

Pitfalls

Don't use one giant model — small orchestrator + reasoning-as-tool scales better
Don't skip KV cache compression — it's free 3.8x VRAM savings
MCP tools are the composability layer — build there, not from scratch
Docker compose profiles — don't boot everything for development
API contracts — services.yaml + AitherService pattern keeps 196 services manageable

AI Company Brains and Personal Agents

AI Company Brains and Personal Agents

What This Is

The Core Insight: Two-Tier Routing

AitherKVCache

Claude Code as Command & Control

Managing the Infrastructure

AitherZero: 170+ PowerShell Scripts

Docker Compose Profiles

API Contracts as Boundaries

What You Can Take Home

Install

aithershell — Standalone CLI

aither-adk — Agent Development Kit

aither-kvcache — KV Cache Compression

Hardware Scaling

Pitfalls

Links

AI Company Brains and Personal Agents

AI Company Brains and Personal Agents

What This Is

The Core Insight: Two-Tier Routing

AitherKVCache

Claude Code as Command & Control

Managing the Infrastructure

AitherZero: 170+ PowerShell Scripts

Docker Compose Profiles

API Contracts as Boundaries

What You Can Take Home

Install

aithershell — Standalone CLI

aither-adk — Agent Development Kit

aither-kvcache — KV Cache Compression

Hardware Scaling

Pitfalls

Links