AitherOS runs dozens of services orchestrated by the central orchestrator. When we opened the demo chat to the public, we knew the chat endpoint needed protection. What we underestimated was the blast radius: the orchestrator exposes 50+ routers, and many POST endpoints can trigger subagent spawning, shell access, model training, and GitHub Actions workflows. A demo user sending the right header to the right endpoint could theoretically forge agents with full tool access.

This post covers the two-wave approach we used to close every gap, the design decisions that made it scale, and the testing strategy that gives us confidence it actually works.

The Problem: Caller Context Was Set, But Not Everywhere

Wave 1 of Caller Isolation (shipped the same day) added the core infrastructure:

Caller type classification — platform, demo, tenant, anonymous
Caller context — 5 permission flags: agentic access, agent spawning, mutation, execution, generation
Async context propagation — caller identity flowing through every async chain
Pipeline gates — the chat engine (agentic/generation), the agent dispatch engine (spawning), the action executor (mutation)

The chat router built a caller context from request headers and set it in the async context. Downstream, the chat engine checked agentic permission before entering the ReAct loop, the dispatch engine checked spawning permission before creating subagents, and the action executor checked mutation permission before running shell commands.

The gap: The permission check in the dispatch engine only works if someone sets the caller context upstream. Five other routers exposed mutation endpoints but never set the caller context. Requests to the forge dispatch, swarm coding, tool invocation, delegation, and scheduler trigger endpoints all hit the subsystem with the default platform-level caller -- full access.

This is the classic "defense in depth worked, but only on one path" problem.

Wave 2: One Dependency to Rule Them All

Rather than patching each router individually (copy-pasting the same IP detection + header parsing + caller context setup), we extracted the logic into a reusable chain.

Layer 1: Shared Utility

We extracted a single function that resolves the real client IP from proxy headers, determines whether the request is local or external, builds the caller context with appropriate permissions, and sets it in the async context. Any FastAPI service -- orchestrator routers, agent-to-agent services, standalone services -- can use it. The IP detection logic is consistent across all entry points.

Layer 2: FastAPI Dependency Wrapper

A 4-line FastAPI dependency that wraps the shared utility. Every router that needs caller awareness adds the dependency to its endpoint signature. FastAPI handles the injection automatically.

Layer 3: Per-Router Gates

Each router adds the dependency and checks the appropriate permission flag. The forge dispatch endpoint checks spawning permission. The swarm coding endpoint checks execution permission. The swarm gate is the most nuanced: text-only brainstorming mode only requires execution permission, but forge mode (which gives agents shell access and file I/O) additionally requires spawning permission. A Growth-tier tenant could run swarm sessions in text mode but not forge mode.

What We Gated

Domain	Operations	Permission Check
Forge dispatch	Subagent dispatch (sync/async/parallel), worktree commits	Spawning / Execution
Swarm coding	Swarm sessions, run-and-deliver, ideate-and-deliver	Execution + Spawning (forge mode)
Agent delegation	Agent delegation, council, forge spawn	Agentic access
Tool invocation	Direct tool calls	Execution
Scheduler	Learning triggers, workflow runners, evolution finetune	Execution
Demand	Demand registration	Execution

Read-only endpoints (agent listings, tool inventories, session statuses) remain unprotected -- they're read-only and useful for dashboards.

What We Didn't Gate (And Why)

Action executor tools — already routed through the action executor which has the Wave 1 mutation guard
Read-only MCP tools (graph queries, code search, etc.) — read-only by design
MCP Gateway tenant context — separate concern (tenant billing vs caller identity)
Unmounted routers — some router modules exist but are not mounted in the orchestrator, so they are unreachable
Agent-to-agent service — internal-only, late boot phase, not exposed through the web dashboard. The shared caller utility is available for future hardening

The Permission Matrix

CallerType   | agentic | forge | mutate | execute | generate
-------------|---------|-------|--------|---------|----------
PLATFORM     |    Y    |   Y   |   Y    |    Y    |    Y
DEMO         |    N    |   N   |   N    |    N    |    N
TENANT       | by tier | by tier|   N   | by tier | by tier
ANONYMOUS    |    N    |   N   |   N    |    N    |    N

Tenant permissions upgrade based on subscription tier. Growth/Professional/Enterprise get agentic+forge+execute. Builder+ gets generate. Explorer gets chat only. The caller context builder uses tier-based permission sets to make these decisions.

Testing Strategy: 91 Tests, Zero Mocking of Security Logic

The testing philosophy: never mock the security check itself. We test by calling the real endpoint function with a crafted caller context (demo user, no spawning permission) and verifying the 403 response. We load router modules dynamically and call the endpoint functions directly with the caller parameter. This is faster than spinning up a full ASGI test client and tests the exact code path that runs in production.

The 91 tests break down as:

14 — Caller context construction for all types + tenant tiers
4 — Request injection prevention (stripping caller context from untrusted JSON)
4 — Orchestrator endpoint blocking
6 — Chat engine agentic + auto-agentic + generation gating
12 — Dispatch engine + action executor mutation guards
4 — Async context propagation + isolation
5 — Backward compatibility (no caller context = platform)
10 — Caller type enum + permission matrix completeness
8 — Request-to-caller builder (local, external, forwarded, no-client)
4 — Forge dispatch gating
6 — Swarm coding gating (including forge mode dual gate)
5 — Agent delegation gating
4 — Scheduler gating
2 — Demand gating
3 — Wave 2 backward compatibility

Design Decisions Worth Noting

1. Default caller is platform, not anonymous. This is deliberate. Internal service-to-service calls (kernel tick, routine execution, background jobs) never set a caller context. If the default were anonymous, every internal operation would break. The security boundary is at the edge (orchestrator routers), not deep inside the pipeline.

2. IP prefix matching, not CIDR parsing. The shared utility uses string prefix matching for private IP ranges instead of full CIDR network objects. The prefix approach is intentionally simpler -- it catches Docker bridge IPs, home networks, and localhost. The edge case where a 172.x external IP gets treated as local is acceptable because the real client IP header from the proxy takes precedence.

3. The dependency is a side effect. It both returns the caller context AND sets it in the async context. This means downstream code (the dispatch engine, the action executor) still sees the correct caller even if it reads from the async context directly. The router gate is the first line of defense; the subsystem gate is defense in depth.

4. We gate at the router, not in middleware. FastAPI middleware runs for ALL requests, including health checks and GET endpoints. A Depends is surgical — it only runs for the endpoints that declare it. This keeps the hot path (health checks, status endpoints) untouched.

What's Next

The remaining surface area is small:

Agent-to-agent service: Internal-only today, but the shared caller utility is ready for when we expose it
MCP Gateway: Has its own auth middleware and tool-level permission checks, but could benefit from caller context propagation for finer-grained gating
Middleware-level audit logging: Every caller context decision should be logged for security auditing

The pattern scales. Adding caller awareness to a new router is 3 lines: one import, one dependency parameter, one permission check. Future routers get the protection for free if they follow the convention.

91 tests. 15 endpoints gated. 5 routers hardened. Zero regressions on the 1700+ existing test suite. Ship it.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringsecurityarchitecturefastapi

Caller Isolation: How We Closed Every Mutation Endpoint in a Large-Scale Agent OS

March 5, 202610 min readAitherium