There is a moment in every infrastructure project where you go from "it works on localhost" to "it works from the other side of the internet." That moment happened today when a GitHub Copilot coding agent — running in a GitHub Actions container somewhere in Azure — connected to our MCP SaaS Gateway through Cloudflare, authenticated with a Bearer token, discovered 212 tools, called three of them, and reported the results back in a GitHub issue. The whole round trip took about four minutes. No human touched the keyboard after the dispatch command.

This is the story of how we built that pipeline and every wall we had to punch through to make it work.

The Vision: Massively Parallel Cloud Coding

AitherOS already had MCP working locally. Claude Code + AitherNode MCP gave us 300+ tools accessible from an IDE session — agent delegation, code search, graph queries, swarm coding, memory. But local MCP has a fundamental constraint: it runs on your machine. One developer, one session, one context window.

GitHub's Agentic Workflows change the calculus entirely. They are cloud-based AI coding agents that can be dispatched programmatically. Each one runs in its own container with its own context. You can fire 10 of them simultaneously. Or 50. They read the repo, write code, submit PRs, and — critically — they can connect to external MCP servers over HTTP.

The implication is staggering: if you can expose your intelligence layer over HTTPS, you can have a fleet of cloud agents doing parallel coding work against your codebase, each one backed by the full power of your agent OS. Code search across 3 million lines. Agent delegation to 29 specialized personas. Graph-powered context retrieval. GPU-scheduled inference. Memory that persists across sessions.

The missing piece was the bridge. A production MCP endpoint that could survive the real internet.

The Architecture

Here is what the full request path looks like when a cloud agent calls a tool:

GitHub Actions Container (Azure)
  └─ MCP Client (Streamable HTTP transport)
       └─ HTTPS → mcp.aitherium.com
            └─ Cloudflare Edge (TLS termination, DDoS protection)
                 └─ Cloudflare Access (Zero Trust policy evaluation)
                      └─ Cloudflare Tunnel (encrypted tunnel to origin)
                           └─ MCP Gateway (Docker container)
                                └─ ASGI Path Normalizer
                                     └─ Auth Middleware (ACTA API key validation)
                                          └─ Tenant Context (tier, rate limits, quotas)
                                               └─ StreamableHTTPSessionManager
                                                    └─ MCP Server (tool dispatch)
                                                         └─ AitherOS service mesh

Nine layers between the cloud agent and the actual tool execution. Each one exists because we hit a real problem without it.

Layer 1: The MCP SDK Transport

The MCP Python SDK (v1.26.0) provides StreamableHTTPSessionManager — the server-side handler for MCP's Streamable HTTP transport. Getting the parameters right was the first battle.

from mcp.server.streamable_http_manager import StreamableHTTPSessionManager

_session_manager = StreamableHTTPSessionManager(
    app=mcp_server,        # The MCP Server instance
    stateless=False,       # Maintain session state across requests
    json_response=True,    # JSON responses instead of SSE streams
)

Three flags. Two of them have defaults that break cloud clients. stateless=False is required because Agentic Workflows maintain session state — they initialize, then make multiple tool calls within the same session. json_response=True because streaming SSE through Cloudflare's proxy adds buffering latency and requires specific header handling that most MCP clients don't configure.

The session manager is mounted as a raw ASGI handler at /mcp using Starlette's Mount:

Mount("/mcp", app=_session_manager.handle_request)

The handle_request attribute is an ASGI callable. Not a Starlette endpoint. Not a FastAPI route. A raw ASGI app. This distinction matters for what comes next.

Layer 2: The Trailing Slash That Breaks Everything

Starlette's Mount directive has a behavior that is technically correct and practically catastrophic: if you mount at /mcp and a request arrives for exactly /mcp (no trailing slash), Starlette returns a 307 redirect to /mcp/.

On localhost, this is invisible. The client follows the redirect, the request lands, everyone is happy.

Through Cloudflare, it is a disaster. The MCP client sends a POST request with a JSON body. The 307 redirect tells the client to re-send the request to the new URL. But HTTP redirects on POST requests are not guaranteed to preserve the request body. Some clients strip it. Some proxies (including Cloudflare in certain configurations) buffer the response and re-issue as GET. The MCP session initialization — the very first message — fails silently.

The fix is an ASGI path normalizer that intercepts the request before Starlette sees it:

async def _normalize_mcp_path(scope, receive, send):
    if scope["type"] == "http" and scope.get("path") == "/mcp":
        scope = dict(scope, path="/mcp/")
    await _wrapped(scope, receive, send)

Four lines. Zero redirects. Both /mcp and /mcp/ resolve to the same handler on the first request.

Layer 3: Authentication and Metering

Every request that reaches the MCP endpoint has already survived Cloudflare's edge. But Cloudflare doesn't know about our tenants, rate limits, or billing. That's the ACTA auth layer.

ACTA (Aitherium Compute Token Authority) is a lightweight billing service that manages API keys, usage quotas, and tier-based access control. The auth middleware extracts the Bearer token from the Authorization header, validates it against ACTA, and injects tenant context into the request:

class AuthMiddleware:
    async def __call__(self, scope, receive, send):
        # Extract Bearer token
        # Validate against ACTA → user_id, plan, remaining tokens
        # Map plan to tier: admin → enterprise, pro → pro, free → free
        # Check rate limits for tier
        # Inject TenantContext into ContextVar
        # Forward to inner app

The tier determines which tools are visible. Free tier gets basic analysis tools. Pro gets code search and agent delegation. Enterprise gets everything including swarm coding and GPU-scheduled inference. Each tool call deducts compute tokens from the balance.

One bug cost us an hour: the plan-to-tier mapping didn't include "admin": "enterprise". Admin users — including the API key we generated for GitHub — were falling through to the default "free" tier and seeing 0 tools. A one-line fix with outsized consequences.

Layer 4: Cloudflare Tunnel

Cloudflare Tunnel creates an encrypted outbound-only connection from our Docker host to Cloudflare's edge. No inbound ports. No firewall rules. No public IP exposure. The tunnel container connects to Cloudflare, and Cloudflare routes matching hostnames back through the tunnel to the specified origin.

The tunnel runs as a Docker container alongside our other services. The tunnel configuration is managed via Cloudflare's API, not a local config file. This means changes to ingress rules (which hostname maps to which backend) are applied by updating the tunnel's remote configuration.

We hit a surprising bug here: the tunnel was pointed at the wrong service. The initial configuration routed mcp.aitherium.com to the MCP Bridges service — which is a completely different thing (it's the internal MCP tool servers for Vision, Canvas, Mind, and Memory). The correct target is the MCP Gateway. With scores of services and multiple MCP-related containers, it's easy to wire the wrong one.

After correcting the ingress rules via the Cloudflare API, each hostname maps to the correct backend service, with a catch-all 404 at the bottom (required by Cloudflare — every tunnel config must end with a default rule).

Layer 5: Cloudflare Access (Zero Trust)

With the tunnel working, mcp.aitherium.com/health returned... a Cloudflare Access login page. HTML. In response to a curl request expecting JSON.

Cloudflare Access is a Zero Trust gateway that sits in front of tunneled applications. We had it configured for the AitherVeil dashboard — requiring SSO login before anyone can reach demo.aitherium.com. The problem: the Access Application for mcp.aitherium.com had the same policy: "Allow Admin Only," which requires browser-based SSO authentication.

MCP clients are not browsers. They don't render login pages. They don't handle OAuth redirects. They send POST requests with JSON bodies and expect JSON responses.

The solution is a Bypass policy. Cloudflare Access policies are evaluated in order. We added an "API Bypass" policy at priority 1 (before the "Allow Admin Only" at priority 2):

Order	Name	Action	Include
1	API Bypass	BYPASS	Everyone
2	Allow Admin Only	ALLOW	Email matching admin

This tells Cloudflare: let all requests through to the origin without SSO. Security is enforced at the application layer — our ACTA auth middleware validates the Bearer token and rejects unauthorized requests long before any tool is executed.

This is a deliberate architectural decision. The MCP Gateway has its own auth stack. Cloudflare Access provides DDoS protection, TLS termination, and the encrypted tunnel. They complement each other rather than duplicating authentication concerns.

Layer 6: The Tool Registry

When a cloud agent connects and sends an MCP initialize request, the gateway responds with server capabilities. When the agent sends tools/list, it gets back 212 tools organized by module:

Module	Tools	Examples
Think/Analyze	6	`think`, `analyze`, `explain`, `debate`, `plan`, `review`
Code Intelligence	14	`code_search`, `codegraph_search`, `explore_code`, `review`
Agent Delegation	8	`ask_agent`, `delegate_task`, `swarm_code`, `agent_status`
Memory	6	`tenant_remember`, `tenant_recall`, `context_search`
System	12	`service_status`, `health_check`, `docker_manage`
Security	5	`security_scan`, `threat_detect`, `audit_log`
GPU/Inference	8	`llm_query`, `model_status`, `vram_check`
...	...	...

Each tool has a JSON Schema for its parameters, a description, and metadata about which tiers can access it. The tenant's tier filters which tools appear in the list — a free-tier agent sees ~30 tools; enterprise sees all 212.

The End-to-End Test

With all six layers in place, we ran the proof:

gh aw run test-aitherium-mcp

This dispatches a GitHub Agentic Workflow — a cloud-based Copilot coding agent running in a GitHub Actions container. The workflow definition tells the agent to connect to https://mcp.aitherium.com/mcp with a Bearer token stored as a GitHub secret.

The MCP Gateway logs showed the agent arrive in real-time:

10:09:47 [INFO] New transport session: 95d7af...
10:09:47 [INFO] Processing ListToolsRequest
10:09:48 [INFO] Processing CallToolRequest: think
10:10:02 [INFO] Processing CallToolRequest: analyze
10:10:14 [INFO] Processing CallToolRequest: git_status

The agent:

Initialized an MCP session through Cloudflare → tunnel → auth → gateway
Listed tools — discovered all 212 available tools
Called think — asked the AitherOS intelligence layer to confirm connectivity
Called analyze — sent a code snippet for analysis
Called git_status — retrieved real repository data (branch: develop, clean working tree, recent commit history)
Created a GitHub issue summarizing the results

Workflow conclusion: success. Zero assignment errors. Zero push failures. Agent output: 8,587 characters of structured results.

The entire round trip — from dispatch to issue creation — took about four minutes. The agent was running on GitHub's infrastructure in Azure. Our MCP Gateway was running on a local machine behind a Cloudflare Tunnel. The two had never communicated before this request.

What This Enables

The test workflow proved connectivity. The real payoff is the Atlas Cloud Worker workflow — a full coding agent that can:

Receive a task description via workflow dispatch input
Gather intelligence using code_search, codegraph_search, and explore_code against the full 3M-line codebase
Consult specialized agents via ask_agent — asking Atlas for project context, Athena for security review, Vera for test coverage analysis
Plan implementation using think and plan backed by the orchestrator model
Write and validate code with full access to the repo
Submit a PR with threat detection that scans for infrastructure leaks, hardcoded secrets, or CI/CD tampering

And because each workflow runs in its own container, you can dispatch 10 of them simultaneously. Give one a frontend task. Give another a backend refactor. Give a third a test coverage gap. They all connect to the same MCP Gateway, each gets their own session, and they work in parallel without stepping on each other.

This is Atlas's delegation pattern — the project manager agent breaking work into packages and dispatching them to specialized workers — but with cloud-scale compute instead of local context windows.

The Numbers

Metric	Value
MCP tools exposed	212
Request layers (cloud → tool)	9
Auth validation time	~15ms
Initialize → first tool call	~1.2s
Tool call round trip (through CF)	~2-4s
Total E2E test time	~4 min
Concurrent agent sessions supported	Limited by ACTA tier
Lines of gateway code	~580
Lines of auth middleware	~594
Bugs that blocked production	5 (documented above)

The Five Bugs

For the record, here are the five issues that would have made this impossible without debugging:

MCP SDK defaults: stateless=True (default) breaks session continuity. json_response=False (default) requires SSE handling through proxies.
Trailing slash redirect: Starlette's Mount returns 307 on /mcp, which strips POST bodies through proxy chains.
Wrong tunnel target: Routed to the MCP Bridges service instead of the MCP Gateway — two MCP-related services with similar names.
CF Access SSO blocking API: Zero Trust policy required browser SSO that programmatic clients can't perform.
Admin tier mapping: "admin" plan not mapped to "enterprise" tier, causing 0 tools to be visible.

None of these show up in local testing. They only manifest when you go through the full internet path. This is why you test end-to-end.

What's Next

The immediate roadmap:

Dispatch real Atlas workers on coding tasks from the backlog
Demiurge-as-architect: Use AitherOS's Demiurge agent to decompose large features into work packages, then dispatch each as a cloud worker
Metering dashboard: Expose ACTA usage stats in AitherVeil so we can see which cloud agents are consuming what
Session persistence: Let cloud agents resume sessions across workflow runs using tenant memory
Multi-repo: Extend the Atlas worker workflow to operate across AitherOS, AitherZero, and AitherVeil simultaneously

The constraint was never compute. GitHub gives you the compute. The constraint was intelligence — having tools smart enough to be useful to a cloud agent that has never seen your codebase before. That's what an agent OS provides. Not just a list of API endpoints, but 212 tools backed by code graphs, agent personas, memory systems, and orchestration models that understand the domain.

The bridge is built. Now we send the army across.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringmcpagentscloudinfrastructurearchitecture

Cloud Agents Unlocked: How We Gave GitHub Copilot Coding Agents Full Access to a Multi-Service Agent OS

March 8, 202614 min readAitherium