Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringagentsci-cdautomationarchitecturedark-factory

It Wrote Its Own PR and Then Reviewed It: How We Closed the Autonomous CI/CD Loop

Name: AitherOS
Author: Aitherium

March 10, 20269 min readAitherium

It Wrote Its Own PR and Then Reviewed It: How We Closed the Autonomous CI/CD Loop

Published by Aitherium

The Screenshot That Made Me Stop Working

Monday morning. I'm debugging an event loop stall in Genesis --- the kind of deep concurrency problem that requires full attention. I glance at GitHub notifications and see four new issues I didn't file, a pull request I didn't open, and a code review I didn't write.

The issues were real bugs. The PR was a legitimate fix. And the review --- the review was better than what most junior developers would write. It identified that the fix was incomplete, pointed to the correct existing utility function the agent should have used instead, and requested three specific changes before merge.

No human prompted any of it. I didn't write a ticket. I didn't assign anyone. I didn't even know these bugs existed.

I just defined some YAML workflows and set up a self-hosted runner. The system took it from there.

What Actually Happened

Here's the timeline, reconstructed from GitHub Actions logs:

12:43 PM --- A scheduled GitHub Actions workflow called bug-hunter runs against the develop branch. It scans the codebase looking for anti-patterns: hardcoded URLs that should use service discovery, shell=True with string concatenation, documentation that's drifted from config. The workflow uses AitherOS MCP tools to query the codebase via CodeGraph.

12:44 PM --- Four issues are filed automatically:

[quality] Documentation Drift: AitherNode port documented as 8080, services.yaml says 8090 (#817)
[bug-hunter] Security: shell=True with restart_command string injection in AitherWatch (7 locations) (#816)
[bug-hunter] Security: shell=True with list in AitherAutonomic cleanup (#815)
[quality] Documentation Drift: AitherNode port documented as 8080 but configured as 8090 (#814)

Each issue has the right labels (auto-discovered, bug, priority:high, layer:1), correct layer classification, and enough context for a developer --- human or otherwise --- to act on it.

12:44 PM --- A second workflow triggers. This one watches for issues labeled agent:local and dispatches them to the Demiurge agent --- our code generation specialist --- running on a self-hosted GitHub Actions runner on my local machine. The runner connects to the full AitherOS stack: vLLM inference, CodeGraph indexing, all 200+ MCP tools.

The Demiurge agent receives a task: replace a hardcoded localhost:11434 Ollama URL in AitherReasoning.py with host.docker.internal:11434 so the service works inside Docker containers. It also needs to write a regression test.

12:44 PM --- The agent uses replace_in_file to patch _check_orchestrator_loaded() at line 2516, replacing http://localhost:11434 with http://host.docker.internal:11434. Then it uses write_file to create a 3-line test:

def test_no_hardcoded_localhost_ollama():
    with open('services/cognition/AitherReasoning.py') as f:
        assert 'localhost:11434' not in f.read()

The commit is authored by AitherOS Agent (demiurge). A PR is created from branch auto/demiurge/20260310-124406 targeting develop. Labels applied: needs-review, auto-fix, agent:demiurge. The PR body includes the full task description, mode (forge), effort level (5), and the GitHub Actions run ID for traceability.

12:45 PM --- A third workflow fires: the Atlas PR Guardian. Atlas is our maintenance review agent --- it reads the diff, checks it against architectural rules, and posts a structured review comment.

This is where it gets interesting.

The Review That Caught the Bug in the Fix

Atlas posted a review with five checks:

Check	Status	Notes
Architecture	PASS	Layer 3 (Cognition) internal change only, no boundary violations
Security	PASS	No secrets, no injection vectors introduced
Code Quality	FAIL	Swaps one hardcoded URL for another; `ollama_url()` already exists in AitherPorts
Test Coverage	PARTIAL	Test added but fragile relative path; misses the unfixed companion function
Blast Radius	1 service	AitherReasoning (Layer 3 Cognition) only

Then it enumerated four specific issues:

Issue 1 --- Incomplete fix. The function _trigger_orchestrator_preload() at line 2535 --- called back-to-back with the patched function in evaluate_gate_with_llm() --- still hardcodes localhost:11434. Fixing one without the other means the preload will still fail in Docker.

Issue 2 --- Wrong approach. AitherPorts.py already exports ollama_url(), which auto-detects the correct host across environments (local dev, Docker via host.docker.internal, container-name-based networking). Hardcoding host.docker.internal breaks bare-metal dev environments. It's the same anti-pattern as localhost, just pointed at a different wrong address.

Atlas even showed the correct fix:

from lib.core.AitherPorts import ollama_url

# In _check_orchestrator_loaded():
resp = await client.get(f"{ollama_url()}/api/ps")

# In _trigger_orchestrator_preload():
await client.post(f"{ollama_url()}/api/chat", ...)

Issue 3 --- Fragile test path. The test uses open('services/cognition/AitherReasoning.py') --- a relative path that only works if pytest runs from the AitherOS/ directory. Tests run from the repo root via python -m pytest dev/tests/, so this will raise FileNotFoundError. The fix: use Path(__file__).parent.parent.parent / "services/cognition/AitherReasoning.py".

Issue 4 --- Test doesn't catch the remaining bug. Because line 2535 still has localhost:11434, the test the agent wrote would actually fail if it could find the file. The PR is internally inconsistent.

Verdict: REQUEST CHANGES.

Three items before merge:

Apply the same URL fix to _trigger_orchestrator_preload() (line 2535)
Use from lib.core.AitherPorts import ollama_url in both functions instead of any hardcoded host
Fix the test's relative file path to use Path(__file__).parent

This is a legitimate, useful code review. Atlas correctly identified that the Demiurge agent's task scope was too narrow --- it fixed one function but missed the companion function called in the same code path. It knew about the existing utility function in a completely different module. It understood that the test was structurally broken for our test runner configuration.

The review even added a constructive note:

This PR was generated by the Demiurge agent. The agent correctly identified the localhost issue but the task scope was narrow enough that it missed the companion function. Worth adjusting the Demiurge task prompt to include "check all functions in the same file that reference the same external service."

That's not just a review. That's a process improvement suggestion for the agent that wrote the code.

The Architecture Behind It

Three GitHub Actions workflows, two AitherOS agents, one self-hosted runner. No orchestration service. No ticket system. No Slack channel. Just YAML and cron.

Workflow 1: Bug Hunter (Scheduled)

Runs on a schedule against the develop branch. Uses GitHub Copilot's coding agent capabilities combined with AitherOS MCP tools to scan for anti-patterns. When it finds something, it opens a GitHub issue with structured labels.

The key insight: the bug hunter doesn't try to fix anything. It just identifies and files. Separation of concerns.

Workflow 2: Agent Dispatcher (Issue-Triggered)

Watches for issues with the agent:local label. When one appears, it dispatches the appropriate AitherOS agent via the self-hosted runner. The runner has access to the full local stack --- vLLM for inference, CodeGraph for codebase search, all MCP tools for file manipulation.

The agent creates a branch, makes changes, writes tests, commits (as AitherOS Agent (demiurge)), and opens a PR. The PR body includes full provenance: task description, mode, effort level, run ID.

Workflow 3: Atlas PR Guardian (PR-Triggered)

Fires on every new PR. Reads the diff, classifies the change by architectural layer, checks against coding standards, and posts a structured review. For agent-generated PRs, it adds context about the generating agent and suggestions for prompt improvement.

Atlas can approve simple changes automatically. For anything that fails a check, it requests changes with specific, actionable items.

The Self-Hosted Runner

This is the piece that makes local agent dispatch possible. GitHub's hosted runners can't reach my vLLM instance or MCP tools. The self-hosted runner runs on the same machine as AitherOS, so agents get full tool access --- file system, code search, LLM inference, the works.

The runner authenticates via the SAML SSO we set up with AitherIdentity acting as our IdP. Same identity system, same security boundary.

What This Means

The traditional software maintenance loop looks like this:

Human notices bug → Human files ticket → Human assigns developer → Developer investigates → Developer writes fix → Developer opens PR → Reviewer reads PR → Reviewer approves or requests changes → Developer addresses feedback → Merge

That's nine steps, three humans, and typically 2-5 business days.

The autonomous loop:

Workflow discovers bug → Workflow files issue → Workflow dispatches agent → Agent writes fix → Agent opens PR → Workflow dispatches reviewer → Reviewer approves or requests changes

Seven steps, zero humans, under two minutes.

And critically: the review agent caught a real problem. This isn't rubber-stamp automation. Atlas identified that the fix was architecturally wrong (hardcoding a different host instead of using the existing utility function), structurally incomplete (missed the companion function), and had a broken test. Those are exactly the things a good human reviewer catches.

The fix still needs human judgment to merge. We're not auto-merging agent PRs --- that's a trust boundary we haven't crossed yet, and probably shouldn't cross until the review agent's track record is longer. But the discovery, implementation, and quality gate are all autonomous.

What Broke (And What We Learned)

The Demiurge agent's fix was wrong. Not catastrophically wrong --- it correctly identified the problem (hardcoded localhost breaks in Docker) and applied a change that would work in container environments. But it replaced one hardcoded URL with another hardcoded URL, when the correct solution was to use the existing ollama_url() function that handles all environments.

This tells us something important about agent task prompts. The task said: "replace http://localhost:11434 with http://host.docker.internal:11434". The agent did exactly what it was told. It didn't search for related functions in the same file. It didn't check whether a utility function already existed for this purpose.

The fix isn't to make the agent smarter. The fix is to make the task prompt broader: "Eliminate all hardcoded Ollama URLs in this file. Use the existing ollama_url() from AitherPorts for environment-aware URL resolution. Write regression tests that verify no hardcoded Ollama hosts remain."

This is the same lesson human engineering managers learn: the quality of the output depends on the quality of the task definition. The agent is a capable executor. The prompt is the specification.

The Stack

For anyone building something similar, here's what's running:

GitHub Actions --- workflows for discovery, dispatch, and review
Self-hosted runner --- on the same machine as the AI stack, for local tool access
AitherOS MCP tools --- 200+ tools exposed via the MCP gateway for code search, file manipulation, git operations
Demiurge agent --- code generation specialist, dispatched via AgentForge with ReAct loop and tool calling
Atlas agent --- maintenance review specialist, reads diffs and checks against architectural rules
vLLM --- local LLM inference (Nemotron-Orchestrator-8B, fine-tuned on our codebase patterns)
CodeGraph --- AST-based code indexing for cross-file analysis
AitherIdentity --- SAML IdP for GitHub SSO, same auth chain for human and agent access

Total new code written to enable this: about 400 lines of YAML workflow definitions. Everything else was already in the system --- we just pointed GitHub Actions at it.

What's Next

The obvious next step is closing the feedback loop. When Atlas requests changes on an agent PR, dispatch a second agent run with the review feedback injected into the task prompt. The agent fixes its own fix, Atlas re-reviews, and if it passes, the PR is ready for human approval.

We're also expanding the bug hunter's vocabulary. Right now it catches hardcoded URLs, shell injection patterns, and documentation drift. We're adding: dead code detection, unused import cleanup, test coverage gaps, dependency version staleness, and configuration drift between services.yaml and actual Docker compose files.

The long-term vision is a codebase that maintains itself. Not writes itself --- the creative work, the architecture decisions, the product direction, that's human. But the maintenance grind? The port number that drifted. The URL that should use service discovery. The test that uses a relative path. The shell=True that's one $(curl) away from RCE.

That work should happen automatically, continuously, with quality review at every step.

We're not there yet. But as of Monday morning, we have a system that discovers its own bugs, writes its own fixes, and reviews its own code --- and the reviewer is honest enough to reject the fix when it's wrong.

That's a pretty good start.

The GitHub Actions workflows, agent dispatch system, and Atlas PR Guardian shown in this post are running in production on the AitherOS repository.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringagentsci-cdautomationarchitecturedark-factory

It Wrote Its Own PR and Then Reviewed It: How We Closed the Autonomous CI/CD Loop

March 10, 20269 min readAitherium

It Wrote Its Own PR and Then Reviewed It: How We Closed the Autonomous CI/CD Loop

Published by Aitherium

The Screenshot That Made Me Stop Working

No human prompted any of it. I didn't write a ticket. I didn't assign anyone. I didn't even know these bugs existed.

I just defined some YAML workflows and set up a self-hosted runner. The system took it from there.

What Actually Happened

Here's the timeline, reconstructed from GitHub Actions logs:

12:44 PM --- Four issues are filed automatically:

[quality] Documentation Drift: AitherNode port documented as 8080, services.yaml says 8090 (#817)
[bug-hunter] Security: shell=True with restart_command string injection in AitherWatch (7 locations) (#816)
[bug-hunter] Security: shell=True with list in AitherAutonomic cleanup (#815)
[quality] Documentation Drift: AitherNode port documented as 8080 but configured as 8090 (#814)

Each issue has the right labels (auto-discovered, bug, priority:high, layer:1), correct layer classification, and enough context for a developer --- human or otherwise --- to act on it.

def test_no_hardcoded_localhost_ollama():
    with open('services/cognition/AitherReasoning.py') as f:
        assert 'localhost:11434' not in f.read()

This is where it gets interesting.

The Review That Caught the Bug in the Fix

Atlas posted a review with five checks:

Check	Status	Notes
Architecture	PASS	Layer 3 (Cognition) internal change only, no boundary violations
Security	PASS	No secrets, no injection vectors introduced
Code Quality	FAIL	Swaps one hardcoded URL for another; `ollama_url()` already exists in AitherPorts
Test Coverage	PARTIAL	Test added but fragile relative path; misses the unfixed companion function
Blast Radius	1 service	AitherReasoning (Layer 3 Cognition) only

Then it enumerated four specific issues:

Atlas even showed the correct fix:

from lib.core.AitherPorts import ollama_url

# In _check_orchestrator_loaded():
resp = await client.get(f"{ollama_url()}/api/ps")

# In _trigger_orchestrator_preload():
await client.post(f"{ollama_url()}/api/chat", ...)

Verdict: REQUEST CHANGES.

Three items before merge:

Apply the same URL fix to _trigger_orchestrator_preload() (line 2535)
Use from lib.core.AitherPorts import ollama_url in both functions instead of any hardcoded host
Fix the test's relative file path to use Path(__file__).parent

The review even added a constructive note:

This PR was generated by the Demiurge agent. The agent correctly identified the localhost issue but the task scope was narrow enough that it missed the companion function. Worth adjusting the Demiurge task prompt to include "check all functions in the same file that reference the same external service."

That's not just a review. That's a process improvement suggestion for the agent that wrote the code.

The Architecture Behind It

Three GitHub Actions workflows, two AitherOS agents, one self-hosted runner. No orchestration service. No ticket system. No Slack channel. Just YAML and cron.

Workflow 1: Bug Hunter (Scheduled)

The key insight: the bug hunter doesn't try to fix anything. It just identifies and files. Separation of concerns.

Workflow 2: Agent Dispatcher (Issue-Triggered)

The agent creates a branch, makes changes, writes tests, commits (as AitherOS Agent (demiurge)), and opens a PR. The PR body includes full provenance: task description, mode, effort level, run ID.

Workflow 3: Atlas PR Guardian (PR-Triggered)

Atlas can approve simple changes automatically. For anything that fails a check, it requests changes with specific, actionable items.

The Self-Hosted Runner

The runner authenticates via the SAML SSO we set up with AitherIdentity acting as our IdP. Same identity system, same security boundary.

What This Means

The traditional software maintenance loop looks like this:

That's nine steps, three humans, and typically 2-5 business days.

The autonomous loop:

Seven steps, zero humans, under two minutes.

What Broke (And What We Learned)

This is the same lesson human engineering managers learn: the quality of the output depends on the quality of the task definition. The agent is a capable executor. The prompt is the specification.

The Stack

For anyone building something similar, here's what's running:

GitHub Actions --- workflows for discovery, dispatch, and review
Self-hosted runner --- on the same machine as the AI stack, for local tool access
AitherOS MCP tools --- 200+ tools exposed via the MCP gateway for code search, file manipulation, git operations
Demiurge agent --- code generation specialist, dispatched via AgentForge with ReAct loop and tool calling
Atlas agent --- maintenance review specialist, reads diffs and checks against architectural rules
vLLM --- local LLM inference (Nemotron-Orchestrator-8B, fine-tuned on our codebase patterns)
CodeGraph --- AST-based code indexing for cross-file analysis
AitherIdentity --- SAML IdP for GitHub SSO, same auth chain for human and agent access

Total new code written to enable this: about 400 lines of YAML workflow definitions. Everything else was already in the system --- we just pointed GitHub Actions at it.

What's Next

That work should happen automatically, continuously, with quality review at every step.

That's a pretty good start.

The GitHub Actions workflows, agent dispatch system, and Atlas PR Guardian shown in this post are running in production on the AitherOS repository.

Enjoyed this post?

All posts Try AitherOS