Early Access Preview
Back to blog
engineeringdark-factoryagentsvast-aigithub-actionscostlocal-firstswarm

13 Parallel Agent Swarms on GitHub Actions — Our $0 Dark Factory Stack

March 19, 202612 min readAitherium
Share

13 Parallel Agent Swarms on GitHub Actions — Our $0 Dark Factory Stack

Published by Aitherium — March 19, 2026


We wanted to see how far you could push autonomous multi-agent software development using only free-tier infrastructure and open-weight models. The answer: surprisingly far.

This post walks through the full stack — 13 specialized agents running on GitHub Actions for $0 in compute, bursting to Vast.ai GPUs when local hardware is busy, shipping tested PRs with self-reviewing feedback loops. Every piece is open source. Here's exactly how it works.


The Core Observation: Agent Infrastructure Is Simpler Than It Looks

An "AI coding agent" is, at its core, three things:

  1. An LLM (which you can run locally or rent by the second)
  2. A system prompt with tool definitions (which is just a YAML file)
  3. An execution environment with file I/O (which is a Docker container)

The interesting engineering challenge isn't building one agent — it's orchestrating many of them in parallel, with different specializations, coordinated through artifact-passing rather than shared context windows.

That's what we built. The architecture is fully transparent, open source, and designed to run on infrastructure you already have access to.


Layer 1: GitHub Actions Is a Free Supercomputer

Here's something most people haven't internalized: GitHub gives you 2,000 free CI minutes per month on private repos, unlimited on public repos. Each runner is a 4-core, 16GB RAM machine. You can run 20 jobs in parallel on the free tier.

We use this as a distributed compute fabric for agent swarms.

The Dark Factory Workflow

Our dark-factory-swarm.yml workflow runs in four phases:

Phase 1 — Plan (1 job, ~30 seconds) Atlas — our architecture agent — reads the task specification and produces a structured work breakdown. Components, data models, API surface, file layout, edge cases. This isn't a vague outline. It's a machine-readable manifest that gets split into discrete, self-contained work units.

Phase 2 — Factory (up to 8 parallel jobs, ~3 minutes each) Each work unit gets its own GitHub Actions runner. Each runner calls the GitHub Models API — which is free with any GitHub account that has Copilot access — and generates code for its assigned component. A frontend job doesn't know what the backend job is doing. They work from the architecture spec, independently, in parallel.

The worker script is 400 lines of Python with zero AitherOS dependencies. It handles rate limits with exponential backoff, writes structured output per unit, and uploads artifacts. That's it. No framework. No agent runtime. Just a Python script calling an LLM.

Phase 3 — Collect (1 job, ~20 seconds) A collector job downloads all artifacts, merges them into a feature branch, and produces a delivery archive.

Phase 4 — Notify (1 job, ~10 seconds) Creates a tracking issue, opens a draft PR linking issue to code, and fires a repository_dispatch event that triggers local refinement.

Total wall-clock time: Under 5 minutes for a full-stack application. Total cost: $0.00. GitHub Actions minutes + GitHub Models API. Both free.

What Just Happened

Thirteen specialized agents — 3 coders, 3 testers, 2 security auditors, 1 architect, 1 scribe, 1 reviewer, 1 judge, 1 collector — collaborated on a codebase without sharing a single context window. They communicated through artifacts, not conversation. The architecture spec is the contract.

This is not a demo. We shipped Wildroot Alchemy — a real inventory management system for a real business — using this exact pipeline. The first successful run took 11 minutes and produced 284 lines of working code across FastAPI and Kotlin.


Layer 2: Local Refinement — Where the Real Intelligence Lives

GitHub Actions gives you scale. But it gives you commodity LLMs (GPT-4o via GitHub Models) with no tool access, no codebase awareness, and no memory. The factory output is rough. It's syntactically correct but architecturally naive.

This is where most "AI coding" products stop. We're just getting started.

Phase 2 of the Dark Factory is local refinement — and this is where having your own agent operating system matters:

The Refinement Pipeline

When the repository_dispatch: dark-factory-refine event fires, our local-agent-worker.yml kicks in. This workflow runs on a self-hosted runner — your own machine, with your own GPU, your own codebase context:

  1. Hydra (code review agent) does a multi-angle review: correctness, style, architecture consistency, test coverage gaps
  2. Athena (security agent) runs threat modeling with real tool access — not just static analysis, actual STRIDE assessment with injection vector mapping
  3. Demiurge (code agent) applies targeted fixes based on review findings, with full access to CodeGraph (semantic code search) and MemoryGraph (historical decisions)
  4. Genesis (judge agent) renders a final ACCEPT / REVISE / REJECT verdict

Each of these agents runs through AitherOS's AgentForge — a ReAct loop with tool calling, VRAM-coordinated GPU access, capability-token authorization, and effort-scaled model selection. This is not a wrapper around an API. This is a full agent runtime with 100+ tools.

Cost of local refinement: electricity. The LLMs run on your GPU via vLLM. The agents use local models. No API calls, no token billing, no usage caps.


Layer 3: Vast.ai — When You Need More GPU Than You Own

"But I only have one GPU." Sure. That's why Vast.ai exists.

Vast.ai is a marketplace for renting GPUs from individual owners. An RTX 4090 costs 0.150.25/hour.AnH100costs0.15–0.25/hour**. An H100 costs **2.50/hour. Compare that to AWS (32/hrforap5.48xlarge)orrunningDevin(32/hr for a p5.48xlarge) or running Devin (500/month flat, regardless of usage).

We built first-class vast.ai integration into AitherOS:

Scale-to-Zero Serverless

# From our gpu-scaling.yaml
vast_serverless:
  enabled: true
  priority: 1
  endpoints:
    saga-orchestrator:
      min_workers: 0        # Scale to zero when idle
      max_workers: 3         # Burst to 3 GPUs under load
      target_util: 0.65
      gpu.min_ram_gb: 16

min_workers: 0 is the key line. When nobody is using the system, we're paying for storage only — fractions of a cent per hour. When a task comes in, a GPU spins up in ~60 seconds, serves the request, and scales back down after 15 minutes of idle.

This is the same pay-per-second model that cloud providers charge 10-50x more for, running on commodity hardware from people who have idle GPUs.

The Full Provider Chain

Our MicroScheduler — the VRAM coordinator at the center of all LLM routing — tries providers in priority order:

  1. Local GPU — Zero cost, zero latency. If your RTX 5090 has VRAM available, use it.
  2. Mesh nodes — Other AitherOS instances on your network. Free.
  3. Vast.ai Serverless — Pay-per-second burst. The "cloud" for $0.20/hr.
  4. Vast.ai Instance — Dedicated GPU for sustained workloads.
  5. RunPod — Secondary provider. Slightly more expensive, sometimes better availability.

The failover is transparent. Your agent doesn't know or care where the GPU lives. It asks MicroScheduler for inference, MicroScheduler finds the cheapest available GPU, routes the request, returns the result. If your local GPU is full, it bursts to vast.ai. If vast.ai is down, it tries RunPod. If everything is down, it queues.

14 MCP Tools for GPU Lifecycle

We didn't just integrate vast.ai — we made it a first-class citizen:

  • vastai_search_gpus — Find the cheapest GPU matching your VRAM/compute requirements
  • vastai_create_instance / vastai_destroy_instance — Full lifecycle management
  • vastai_ssh_exec — Run commands on remote GPU instances
  • vastai_get_balance — Monitor spending
  • vastai_destroy_all — Emergency kill switch

Our agents can autonomously provision GPU resources when they need more compute. An agent working on a complex task can decide "I need a reasoning model for this" → search vast.ai for a 24GB GPU → provision it → run inference → tear it down. Total cost: maybe $0.05 for the 3 minutes it was active.


Layer 4: The Dark Factory — Self-Improving Autonomous Loops

This is the part that doesn't have a commercial equivalent at any price.

The Dark Factory isn't just a code generator. It's a self-improving system with feedback loops that make it better over time without human intervention.

SessionLearner — Mid-Conversation Adaptation

When an agent fails the same way three times in a session — same intent type, same effort level, same error pattern — SessionLearner captures the pattern and injects a behavioral correction into the system prompt. Not a retry. A learning.

The correction lives between the [MEMORIES] and [AFFECT] layers of the context pipeline. Max 3 active learnings, ~200 tokens total. Tight enough to not pollute the context window, specific enough to prevent the same failure mode.

After 5+ successful applications, the learning gets promoted to MemoryGraph (persistent) and exported to DaydreamCorpus (training data for the next finetune cycle).

The agent literally learns from its mistakes and writes the lesson into its own training data.

SelfModificationPipeline — Autonomous Code Fixes

When the pain system detects a recurring failure — a service that keeps crashing, an endpoint that keeps timing out — SelfModificationPipeline kicks in:

  1. CodeForge analyzes the failure
  2. Generates a fix (max 500 lines, max 10 files — safety limits)
  3. Creates a branch and PR
  4. CI runs automatically
  5. If tests pass, merges
  6. Deploys the fix
  7. Checks Pulse to verify the original pain signal resolved
  8. If pain persists after deployment → automatic rollback

The safety gates are paranoid by design: blocked paths (secrets, auth, docker-compose), dedup gates (1-hour cooldown after failed attempts), intent-based gating (high-effort modifications require review). But within those guardrails, the system fixes itself.

PlaybookEngine — Learning from Repetition

After observing the same operational pattern three times — restart this service, clear that cache, rotate those credentials — PlaybookEngine auto-generates a YAML runbook. Next time the same situation arises, it executes the playbook instead of reasoning from scratch.

The playbooks have conditional steps, approval gates, and automatic rollback on failure. They're not scripts — they're operational knowledge that the system extracted from its own behavior.

StrataFeedback — Closing the Training Loop

Every interaction, every agent dispatch, every success and failure gets ingested into Strata (our telemetry system). StrataFeedback mines this data to:

  • Calibrate confidence: If an intent type succeeds 95% of the time, boost confidence. If it fails 40%, reduce it.
  • Adjust effort routing: If effort-level-3 tasks on the reasoning model succeed more than on the orchestrator model, recommend the routing change.
  • Trigger retraining: When performance degrades on a specific task type, flag it for the next finetune cycle.

This is a closed loop. The system uses itself, measures the results, adjusts its own parameters, and feeds successful patterns back into model training. No human in the loop.


The Math

Let's be honest about costs. "Free" doesn't mean zero — it means the marginal cost approaches zero because you're using infrastructure you already have or infrastructure that's genuinely free.

Monthly cost to run AitherOS Dark Factory

ComponentCostNotes
GitHub Actions$0Free tier: 2,000 min/month. We use ~400.
GitHub Models API$0Free with Copilot access. We use GPT-4o for factory phase.
Local GPU (RTX 5090)$0Already owned. ~150W during inference. ~$5/month electricity.
Vast.ai burst~$5-15/monthScale-to-zero. Only active during peak loads.
Self-hosted runner$0Runs on the same machine as AitherOS.
Total~$5-20/month

For reference: hosted alternatives

ApproachTypical CostTrade-off
Hosted single-agent (e.g. Devin)$500/monthManaged, but single-agent, cloud-only, token limits.
IDE copilot (e.g. Cursor, Copilot)$40/month/seatGreat for augmentation, not autonomous multi-agent workflows.
Claude Code + API~$100-300/monthPowerful single-agent, but no swarm or self-improvement loops.
Custom cloud API stack$200-500/month in tokensFull flexibility, but you build the orchestration and pay per token.

The self-hosted approach trades operational overhead for dramatically lower marginal cost and higher autonomy ceiling.


Why This Works and Why Nobody Else Is Doing It

Three converging forces make this possible right now:

1. Open-weight models got good enough. Nemotron-Orchestrator-8B running on a single RTX 5090 handles 80% of coding tasks. DeepSeek-R1-14B handles the hard reasoning. You don't need GPT-4 for everything. You need a scheduler that picks the right model for the right task. That's MicroScheduler.

2. GitHub Actions is absurdly underpriced. 2,000 free minutes of 4-core/16GB compute per month. The GitHub Models API gives you GPT-4o calls for free. Microsoft is subsidizing this to keep developers on GitHub. We're using that subsidy for exactly what it's designed for — running development workflows.

3. Vast.ai created a GPU commodity market. Gamers and crypto miners have idle GPUs. Vast.ai connects them to people who need burst compute. The result is GPU pricing that's 10-50x cheaper than AWS/Azure/GCP. An H100 for 2.50/hrinsteadof2.50/hr instead of 32/hr. RTX 4090 for $0.20/hr instead of "contact sales."

The orchestration layer — coordinating multiple agents across distributed compute with tool access, memory, and feedback loops — is the genuinely hard part. That's what we focused on. It's open source. It has 2,690+ passing tests.


The 61 Workflows

This isn't a weekend project with a single CI pipeline. AitherOS has 61 GitHub Actions workflows covering:

  • CI gate: 4-parallel jobs (Python, PowerShell, Next.js, Docker lint)
  • Docker: Layered image builds across 9 service layers + 5 specialized images
  • Dark Factory: Full swarm pipeline (plan → factory → collect → notify)
  • Local Agent Worker: Self-hosted forge dispatch with full tool access
  • Security: pip-audit, npm audit, gitleaks, Bandit SAST, CodeQL
  • Training: Weekly automated fine-tuning (LoRA/QLoRA/DPO) on vast.ai GPUs
  • Deployment: Genesis orchestration, Veil static export, Docker lifecycle
  • 44 autonomous workflows: Atlas bug hunter, continuous docs, issue triage, PR review, code simplification — all running on schedule with zero human triggering

The autonomous workflows run daily. Atlas scans for bugs at 5 AM. Continuous docs updates ROADMAP.md when issues close. The PR guardian reviews every pull request. None of this requires a human to remember to run it.


What This Enables

To put the capability in context: most single-agent coding tools give you one agent with a cloud sandbox. This stack gives you 13 specialized agents with forge-backed tool calling, multi-phase orchestration, semantic code search, persistent memory, autonomous training loops, self-modification pipelines, and GPU burst scaling — running on infrastructure that costs between nothing and a fancy coffee per month.

The technology is open. The compute is commodity. The orchestration is a solvable engineering problem (we solved it — 203 microservices, 2,690+ tests). If you're willing to run your own infrastructure, the capability ceiling is dramatically higher than what hosted single-agent tools offer.


Try It

AitherOS is open source. The Dark Factory workflow is in .github/workflows/dark-factory-swarm.yml. The swarm engine is in lib/orchestration/SwarmCodingEngine.py. The vast.ai integration is in services/mesh/providers/vast_ai.py.

Fork the repo. Run setup.ps1. Point the dark factory at a problem. Watch 13 agents build software while you do something else.


This is the sixth post in the Dark Factory series. Previously: Surgical Context Management, Autonomous CI/CD, The Dark Factory Pattern, First Run, The Complete Story.

Enjoyed this post?
Share