From PAT to GitHub App: Wiring AitherFlow as a Two-Way Citizen
From PAT to GitHub App: Wiring AitherFlow as a Two-Way Citizen
For a long time, AitherFlow talked to GitHub the way most internal tooling does: a long-lived Personal Access Token in an env var, broad scopes, no signed webhooks, no real identity. It worked. It was also exactly the kind of thing that shows up in a post-incident report under "how did the credential leak give them everything?"
This is the story of replacing that with a proper GitHub App — and, more interestingly, the supporting infrastructure we built so the next service that needs the same treatment takes about ninety seconds instead of an afternoon.
What "two-way" actually means
A GitHub App earns its keep when both directions are real:
- Inbound: GitHub sends webhooks. We verify the HMAC-SHA256 signature with a shared secret and dispatch to the right agent.
- Outbound: We mint short-lived installation tokens (signed JWT → token, ~1 hour TTL) and call the REST API as the App, not as a user.
A PAT is one-way pretending to be two-way. It can call the API, sure, but webhooks land unsigned, the identity is "whoever owns the token," and revocation means someone has to remember which PAT.
The auth flow, in code
The new path lives in AitherFlow.py. The relevant bits:
# Loaded from AitherSecrets at startup
GITHUB_APP_ID = get_secret("GITHUB_APP_ID")
GITHUB_APP_PRIVATE_KEY = get_secret("GITHUB_APP_PRIVATE_KEY")
GITHUB_APP_INSTALLATION_ID = get_secret("GITHUB_APP_INSTALLATION_ID")
GITHUB_WEBHOOK_SECRET = get_secret("GITHUB_WEBHOOK_SECRET")
class GitHubAppAuth:
async def get_installation_token(self) -> str:
# Cache hit?
if self._token and self._expires_at > now() + timedelta(minutes=5):
return self._token
# Mint a 10-minute JWT signed with the App private key (RS256)
jwt_token = self._mint_jwt()
# Exchange for an installation token (~1h TTL)
async with httpx.AsyncClient() as client:
r = await client.post(
f"https://api.github.com/app/installations/{self.install_id}/access_tokens",
headers={"Authorization": f"Bearer {jwt_token}",
"Accept": "application/vnd.github+json"},
)
data = r.json()
self._token = data["token"]
self._expires_at = parse_iso(data["expires_at"])
return self._token
The auth_mode property returns "github_app" when all three GitHub App secrets are present, and falls back to "token" otherwise. Existing call-sites didn't have to change — they just got better tokens.
Webhooks: verify, then fan out
The webhook handler does three things in this order, and the order matters:
@app.post("/webhooks/github")
async def github_webhook(request: Request, background_tasks: BackgroundTasks):
body = await request.body()
sig = request.headers.get("X-Hub-Signature-256", "")
if not _verify_github_signature(body, sig, GITHUB_WEBHOOK_SECRET):
raise HTTPException(401, "bad signature")
# Fan out to AitherRelay so workspace channels light up
background_tasks.add_task(_forward_to_relay, body, dict(request.headers))
event = request.headers.get("X-GitHub-Event", "")
return await _dispatch(event, json.loads(body))
Constant-time comparison on the HMAC. Raw body forwarded to Relay (not the parsed dict — Relay re-verifies the signature itself, so it needs the bytes that were actually signed). Dispatch happens on the request thread; fan-out happens on the background task so a slow Relay can't slow the GitHub ack.
One webhook URL, two consumers
GitHub Apps allow exactly one webhook URL. That used to mean "pick which service you care about." Now it means "pick the service that knows how to fan out."
AitherFlow.py is the front door. After verification it schedules _forward_to_relay, which POSTs the original body and X-Hub-Signature-256 header to {AITHERRELAY_URL}/v1/webhooks/github. AitherRelay.py re-verifies the signature with its own copy of the secret and posts the event to the right workspace channel — every workspace has a github_repo field, plus there's a global #aitherium-builds channel for org-wide noise.
The result: one URL on github.com, two completely independent consumers. AitherFlow can be down for a maintenance restart and Relay still gets the events on the next delivery (GitHub retries). Relay can be down and AitherFlow keeps dispatching to agents.
The credential problem nobody wants to solve
Setting up a GitHub App requires a webhook secret. The old way to generate one looks like this:
openssl rand -base64 32 | tr -d '/+=' | head -c 48
# copy output to clipboard
# open AitherSecrets dashboard
# paste, name it GITHUB_WEBHOOK_SECRET, save
# go back to github.com, paste again
Five steps, two contexts, one chance to fat-finger the value into the wrong field. We added a single endpoint to AitherSecrets.py:
@app.post("/secrets/generate")
async def generate_secret(req: GenerateSecretRequest):
"""Atomic generate-and-store. Idempotent unless overwrite=true."""
if not req.overwrite and _exists(req.name):
return {"created": False, "value": _read(req.name)}
value = _generate(req.format, req.length) # urlsafe | hex | alphanumeric
_write(req.name, value)
return {"created": True, "value": value}
Idempotent by default. Ask for the same secret twice, get the same value back — no surprise rotations. Pass overwrite=true and it rotates atomically.
Make it a slash command
Endpoints are nice. Slash commands are nicer. aithershell has a Python plugin system — drop a file in aithershell/plugins/builtins/, inherit from SlashCommand, you're done.
secret.py gives you /secret gen|set|get|ls. github_app.py gives you the App-specific flow:
/github-app status — what's set, what's missing
/github-app permissions — exact GitHub form values to tick
/github-app gen-webhook-secret — generate + store + clipboard
/github-app set-id <APP_ID>
/github-app load-key <pem-path>
/github-app set-install <INSTALL_ID>
/github-app link <slug> <repo> — wire workspace to repo via Relay
/github-app verify — round-trip a JWT against api.github.com
/github-app bootstrap — all of the above, in order
/github-app bootstrap is the headline. It generates the webhook secret (idempotent), copies it to your clipboard, prints the exact permissions and events to tick, prints your webhook URL, and walks you through a 12-step checklist that ends in "re-deliver the ping from github.com → 200 OK."
The total time from "I should make this a real GitHub App" to "200 OK on the ping delivery" is about as long as it takes to click through the github.com form.
The papercuts we hit
A few things bit us on the way:
HTTPS with self-signed certs. AitherSecrets runs HTTPS in dev with a self-signed cert. The first version of the slash command used http://, got "Empty reply from server," and we spent five minutes blaming the container. Fix: default to https:// and pass verify=False to every httpx client in the plugin. (In prod, the cert is real and verification is on.)
Containers running old code. The /secrets/generate endpoint went into the codebase before it went into the running container. The plugin handles this gracefully: try the new endpoint, on 404 fall back to client-side generation plus a plain POST /secrets. Same observable behavior, no upgrade required.
Signature forwarding. First cut of _forward_to_relay parsed the body to JSON before forwarding. Relay then computed HMAC over the re-serialized JSON, which doesn't match the bytes GitHub signed. Verification failed every time. Fix: forward the raw bytes and the original X-Hub-Signature-256 header, untouched.
What this unlocks
With the App installed and the webhook fanning out, the agent layer can finally do its job without anyone watching:
- Demiurge auto-reviews PRs the moment they open. Not a "lgtm 🚀" bot — actual diffs against the architecture rules in .github/instructions/.
- Atlas triages security alerts (Dependabot, code scanning, secret scanning) into the roadmap with priority labels.
- Forge picks up issues tagged
agent-readyand produces a PR. - Lyra posts release notes to
#aitherium-buildswhenever a tag goes up. - CodeGraph re-indexes on push to default branches so RAG queries stay fresh.
All of those existed before. They were just gated behind "remember to manually trigger the workflow" or "remember to paste the diff into chat." Now they fire on the event that actually represents the thing happening.
What we'd do differently
If we were doing this from scratch:
- Build the slash command first. We built the AitherFlow auth code, then the AitherSecrets endpoint, then the slash command. The slash command is what we actually use. We should have started there and let it pull the rest into existence.
- Document the permissions inline. The
_PERMISSIONSconstant in the plugin is the source of truth for "what to tick on github.com." It should have been there from the first commit, not added when someone (us) forgot which boxes to check on the second install. - Test the fan-out with a real ping earlier. The signature-forwarding bug would have surfaced immediately if we'd round-tripped a real GitHub ping through both services on day one.
Try it
/github-app bootstrap
Then click through the github.com form. The slash command will tell you what to do next at every step. If you get stuck, /github-app status shows you exactly which secrets are present and which are missing.
If you're adding the next service that needs GitHub credentials, the pattern is now reusable: vault for the secret, slash command for the flow, single webhook URL with fan-out for the events. The interesting code is in your service. The boring code is already written.