Caller Isolation: How We Closed Every Mutation Endpoint in a Large-Scale Agent OS
AitherOS runs dozens of services orchestrated by the central orchestrator. When we opened the demo chat to the public, we knew the chat endpoint needed protection. What we underestimated was the blast radius: the orchestrator exposes 50+ routers, and many POST endpoints can trigger subagent spawning, shell access, model training, and GitHub Actions workflows. A demo user sending the right header to the right endpoint could theoretically forge agents with full tool access.
This post covers the two-wave approach we used to close every gap, the design decisions that made it scale, and the testing strategy that gives us confidence it actually works.
The Problem: Caller Context Was Set, But Not Everywhere
Wave 1 of Caller Isolation (shipped the same day) added the core infrastructure:
- Caller type classification — platform, demo, tenant, anonymous
- Caller context — 5 permission flags: agentic access, agent spawning, mutation, execution, generation
- Async context propagation — caller identity flowing through every async chain
- Pipeline gates — the chat engine (agentic/generation), the agent dispatch engine (spawning), the action executor (mutation)
The chat router built a caller context from request headers and set it in the async context. Downstream, the chat engine checked agentic permission before entering the ReAct loop, the dispatch engine checked spawning permission before creating subagents, and the action executor checked mutation permission before running shell commands.
The gap: The permission check in the dispatch engine only works if someone sets the caller context upstream. Five other routers exposed mutation endpoints but never set the caller context. Requests to the forge dispatch, swarm coding, tool invocation, delegation, and scheduler trigger endpoints all hit the subsystem with the default platform-level caller -- full access.
This is the classic "defense in depth worked, but only on one path" problem.
Wave 2: One Dependency to Rule Them All
Rather than patching each router individually (copy-pasting the same IP detection + header parsing + caller context setup), we extracted the logic into a reusable chain.
Layer 1: Shared Utility
We extracted a single function that resolves the real client IP from proxy headers, determines whether the request is local or external, builds the caller context with appropriate permissions, and sets it in the async context. Any FastAPI service -- orchestrator routers, agent-to-agent services, standalone services -- can use it. The IP detection logic is consistent across all entry points.
Layer 2: FastAPI Dependency Wrapper
A 4-line FastAPI dependency that wraps the shared utility. Every router that needs caller awareness adds the dependency to its endpoint signature. FastAPI handles the injection automatically.
Layer 3: Per-Router Gates
Each router adds the dependency and checks the appropriate permission flag. The forge dispatch endpoint checks spawning permission. The swarm coding endpoint checks execution permission. The swarm gate is the most nuanced: text-only brainstorming mode only requires execution permission, but forge mode (which gives agents shell access and file I/O) additionally requires spawning permission. A Growth-tier tenant could run swarm sessions in text mode but not forge mode.
What We Gated
| Domain | Operations | Permission Check |
|---|---|---|
| Forge dispatch | Subagent dispatch (sync/async/parallel), worktree commits | Spawning / Execution |
| Swarm coding | Swarm sessions, run-and-deliver, ideate-and-deliver | Execution + Spawning (forge mode) |
| Agent delegation | Agent delegation, council, forge spawn | Agentic access |
| Tool invocation | Direct tool calls | Execution |
| Scheduler | Learning triggers, workflow runners, evolution finetune | Execution |
| Demand | Demand registration | Execution |
Read-only endpoints (agent listings, tool inventories, session statuses) remain unprotected -- they're read-only and useful for dashboards.
What We Didn't Gate (And Why)
- Action executor tools — already routed through the action executor which has the Wave 1 mutation guard
- Read-only MCP tools (graph queries, code search, etc.) — read-only by design
- MCP Gateway tenant context — separate concern (tenant billing vs caller identity)
- Unmounted routers — some router modules exist but are not mounted in the orchestrator, so they are unreachable
- Agent-to-agent service — internal-only, late boot phase, not exposed through the web dashboard. The shared caller utility is available for future hardening
The Permission Matrix
CallerType | agentic | forge | mutate | execute | generate
-------------|---------|-------|--------|---------|----------
PLATFORM | Y | Y | Y | Y | Y
DEMO | N | N | N | N | N
TENANT | by tier | by tier| N | by tier | by tier
ANONYMOUS | N | N | N | N | N
Tenant permissions upgrade based on subscription tier. Growth/Professional/Enterprise get agentic+forge+execute. Builder+ gets generate. Explorer gets chat only. The caller context builder uses tier-based permission sets to make these decisions.
Testing Strategy: 91 Tests, Zero Mocking of Security Logic
The testing philosophy: never mock the security check itself. We test by calling the real endpoint function with a crafted caller context (demo user, no spawning permission) and verifying the 403 response. We load router modules dynamically and call the endpoint functions directly with the caller parameter. This is faster than spinning up a full ASGI test client and tests the exact code path that runs in production.
The 91 tests break down as:
- 14 — Caller context construction for all types + tenant tiers
- 4 — Request injection prevention (stripping caller context from untrusted JSON)
- 4 — Orchestrator endpoint blocking
- 6 — Chat engine agentic + auto-agentic + generation gating
- 12 — Dispatch engine + action executor mutation guards
- 4 — Async context propagation + isolation
- 5 — Backward compatibility (no caller context = platform)
- 10 — Caller type enum + permission matrix completeness
- 8 — Request-to-caller builder (local, external, forwarded, no-client)
- 4 — Forge dispatch gating
- 6 — Swarm coding gating (including forge mode dual gate)
- 5 — Agent delegation gating
- 4 — Scheduler gating
- 2 — Demand gating
- 3 — Wave 2 backward compatibility
Design Decisions Worth Noting
1. Default caller is platform, not anonymous. This is deliberate. Internal service-to-service calls (kernel tick, routine execution, background jobs) never set a caller context. If the default were anonymous, every internal operation would break. The security boundary is at the edge (orchestrator routers), not deep inside the pipeline.
2. IP prefix matching, not CIDR parsing. The shared utility uses string prefix matching for private IP ranges instead of full CIDR network objects. The prefix approach is intentionally simpler -- it catches Docker bridge IPs, home networks, and localhost. The edge case where a 172.x external IP gets treated as local is acceptable because the real client IP header from the proxy takes precedence.
3. The dependency is a side effect. It both returns the caller context AND sets it in the async context. This means downstream code (the dispatch engine, the action executor) still sees the correct caller even if it reads from the async context directly. The router gate is the first line of defense; the subsystem gate is defense in depth.
4. We gate at the router, not in middleware. FastAPI middleware runs for ALL requests, including health checks and GET endpoints. A Depends is surgical — it only runs for the endpoints that declare it. This keeps the hot path (health checks, status endpoints) untouched.
What's Next
The remaining surface area is small:
- Agent-to-agent service: Internal-only today, but the shared caller utility is ready for when we expose it
- MCP Gateway: Has its own auth middleware and tool-level permission checks, but could benefit from caller context propagation for finer-grained gating
- Middleware-level audit logging: Every caller context decision should be logged for security auditing
The pattern scales. Adding caller awareness to a new router is 3 lines: one import, one dependency parameter, one permission check. Future routers get the protection for free if they follow the convention.
91 tests. 15 endpoints gated. 5 routers hardened. Zero regressions on the 1700+ existing test suite. Ship it.