AI Company Brains and Personal Agents
AI Company Brains and Personal Agents
AI Tinkerers OC — June 2, 2026 | 10-min demo
What This Is
An operating system for AI agents running entirely on local hardware — zero cloud LLM spend. 196 microservices, 43 agents, 147 MCP tools.
Live right now:
- portal.aitherium.com — agent fleet hub
- chelle.aitherium.com — photography business AI
- garg.aitherium.com — government consulting AI
- irc.aitherium.com — live chat (join #tinkerers!)
- Discord: Vibeful — AI dev community
RTX 5090 + DGX Spark under my desk. Demoing through a Cloudflare tunnel back to my house.
The Core Insight: Two-Tier Routing
One model handles everything. Reasoning is a tool, not a tier.
User Request
|
v
[Nemotron-Orchestrator-8B] ← always resident, 14GB with TQ4
|
|--- 90% of requests → direct response
|
|--- "I need to think harder" → calls deep_reasoning tool
| |
| v
| [DeepSeek-R1:14b]
| |
| v
|--- ← reasoning result ← ─────────┘
|
v
Response
Why it works:
- 8B model handles tool calling, routing, orchestration, and most responses
- Reasoning model only fires when the orchestrator decides it needs deep thinking
- Both run on one RTX 5090 via vLLM continuous batching
- 4-8 agents can work simultaneously on one GPU
AitherKVCache
4-bit KV cache compression → 3.8x memory savings → 6x concurrent requests at 40K context.
pip install aither-kvcache
# Drop into any vLLM instance:
# --kv-cache-dtype tq-t4nc
Claude Code as Command & Control
Claude Code is my brain. AitherOS is my local AI infrastructure.
147 MCP tools — all routed to local compute:
- Code search (CodeGraph + Repowise)
- Agent dispatch (forge_subagent)
- Memory graph (remember, recall, query_memory)
- Git, filesystem, terminal
- LLM inference (local vLLM)
Claude Code → MCP → AitherNode → local services → vLLM → response
The key: Cloud intelligence for planning, local hardware for execution.
Managing the Infrastructure
AitherZero: 170+ PowerShell Scripts
Numbered automation scripts organize the entire lifecycle:
00-prerequisites/ # Environment setup
10-initialization/ # First-time config
20-networking/ # Tunnel, DNS, certificates
30-services/ # Docker compose orchestration
40-deployment/ # Build, push, deploy
50-maintenance/ # Health checks, cleanup
60-monitoring/ # Grafana, alerts
70-security/ # Vault rotation, audit
80-testing/ # Pester, pytest, integration
90-cleanup/ # Teardown, reset
Docker Compose Profiles
Graduated startup — don't boot 90 containers for chat:
# Raw LLM chat (~20 containers)
docker compose --profile chat-minimal up -d
# Full personality + memory (~29 containers)
docker compose --profile chat-full up -d
# + Agent tool use (~31 containers)
docker compose --profile chat-agents up -d
API Contracts as Boundaries
Every service follows the same pattern:
import services._bootstrap
from lib.core.AitherIntegration import AitherService, get_port
from lib.core.AitherEvents import setup_lifecycle
aither = AitherService("MyService", port=get_port("MyService"))
app = aither.app
setup_lifecycle(app, "MyService", port=PORT, register_identity=True)
config/services.yaml is the single source of truth for all 196 services.
What You Can Take Home
Install
pip install aithershell aither-adk aither-kvcache
aithershell — Standalone CLI
Works with Ollama, vLLM, or OpenAI directly (no AitherOS required):
aither --init # Configure your LLM backend
aither "build me an agent" # Instant chat
# Or set backend explicitly:
export AITHER_LLM_BACKEND=ollama
export AITHER_LLM_URL=http://localhost:11434
aither "hello"
aither-adk — Agent Development Kit
aither init myagent # Scaffold agent project
cd myagent
aither run # Start agent server
# Auto-detects: Ollama → vLLM → OpenAI
# Generated config.yaml defaults to llm_backend: auto
aither-kvcache — KV Cache Compression
Drop into any vLLM instance for 3.8x VRAM savings:
pip install aither-kvcache
# Add to vLLM args: --kv-cache-dtype tq-t4nc
Hardware Scaling
| Setup | What You Get |
|---|---|
| RTX 5090 alone | Nemotron-8B orchestrator + DeepSeek-R1-14B reasoning (4-bit) |
| 5090 + Ollama on CPU | Above + CPU fallback for lighter queries |
| 5090 + DGX Spark | Full AitherOS: 27B primary + 14B reasoning + embeddings |
Pitfalls
- Don't use one giant model — small orchestrator + reasoning-as-tool scales better
- Don't skip KV cache compression — it's free 3.8x VRAM savings
- MCP tools are the composability layer — build there, not from scratch
- Docker compose profiles — don't boot everything for development
- API contracts —
services.yaml+AitherServicepattern keeps 196 services manageable
Links
- GitHub: aither-adk | aithershell | aither-kvcache | AitherZero
- Chat: irc.aitherium.com (join #tinkerers)
- Discord: discord.gg/PpEFUVHv (Vibeful — AI dev community)
- Portal: portal.aitherium.com
- PyPI:
pip install aithershell aither-adk aither-kvcache