VideoDirector: How We Built a Multi-Agent Video Production Pipeline
VideoDirector: How We Built a Multi-Agent Video Production Pipeline
We just shipped one of the most ambitious orchestration features in AitherOS — VideoDirector, a 7-phase video production pipeline that coordinates five AI services, six specialized agents, and puts humans in the loop at every critical juncture.
The Problem
We already had the pieces: Iris for visual art direction, AitherCanvas wrapping ComfyUI for image generation, Remotion with 19 compositions for motion graphics, TTS via PerceptionMedia, and ffmpeg for post-production. But they were islands. Making a polished video meant manually chaining API calls, copying files between services, and praying the styles were consistent.
We wanted to say "make a narrated video about quantum computing" and have the system figure out the rest.
The Architecture
VideoDirector orchestrates production in seven phases:
-
Art Direction — Iris generates a
VisualStyleGuide: color palette, mood, typography, background style, and ComfyUI style prompts. Every asset in the video follows this guide. -
Asset Generation — Canvas/ComfyUI produces per-scene images and animations using the style guide. Each scene can specify custom prompts or let the director auto-generate them.
-
Quality Gate — Iris evaluates generated assets against a quality threshold. Below the bar? It feeds evaluation feedback back into Canvas for img2img refinement. Up to 2 refinement rounds per asset.
-
Scene Rendering — Each scene maps to a Remotion composition (MotionGraphics, PresentationDeck, DataVisualization, TutorialWalkthrough, etc.) and renders as an individual MP4 clip.
-
Narration — TTS synthesizes per-scene audio from the narration script.
-
Assembly — ffmpeg concatenates scene clips, overlays narration audio, and mixes in background music with configurable volume.
-
Publish — The final video is registered as a Strata artifact with full metadata.
The Unified Work Schema
To make this work, we needed every subsystem to speak the same language. Enter WorkUnit — the ONE canonical work package schema for AitherOS.
Previously we had 13+ incompatible task formats: TaskHub WorkPackages, AgentForge specs, A2A tasks, MCTS PlanNodes, Notebook cells, Expedition tasks, and Playbook steps. WorkUnit unifies them all with bidirectional adapters at every boundary.
Key design decisions:
- Comms built in, not bolted on — every WorkUnit carries its own notification config (who to email, which channels, audit level)
- Governance by default —
approval_required,rollback_action,idempotentflags on every unit - DAG-native —
depends_onfor composition,sub_unitsfor nesting - Status machine — PENDING → IN_PROGRESS → COMPLETED | FAILED | AWAITING_APPROVAL
Dynamic Workflow Planning
The DynamicWorkflowPlanner takes any natural language goal and plans a complete workflow:
- DISCOVER — loads all available capabilities from WorkUnitRegistry (agents, tools, skills, playbooks, notebook templates)
- DECOMPOSE — breaks the goal into sub-tasks using LLM (with heuristic fallback)
- ASSIGN — matches sub-tasks to the best agent/tool using relevance scoring
- WIRE — builds the dependency DAG with auto-inserted human gates before any dangerous operation
- EMIT — outputs as a ComposableWorkflow or interactive Agent Notebook
The planner auto-inserts human approval gates before any action tagged as publish, broadcast, deploy, delete, or permanent. No AI action goes public without human sign-off.
The Agent Ensemble
The auto-video-marketing playbook demonstrates the full agent ensemble:
| Agent | Role |
|---|---|
| Lyra | Deep topic research — trends, papers, narratives |
| Atlas | Content strategy — narrative arc, editorial direction |
| Vera | Narration scripting — TTS-optimized prose |
| Demiurge | Visual design — Remotion slide JSON, storyboarding |
| Hera | Social distribution — LinkedIn copy, publishing |
Four human approval gates ensure the operator maintains full steering control:
- After strategy (before scripting)
- After script + visuals (before GPU rendering)
- After video render (before social posting)
- After social copy (before public publishing)
MCP Tools
Two new MCP tools expose VideoDirector to any agent or external system:
produce_video()— full production from a brief with structured scenesplan_video_production()— goal → brief planning for review before execution
Plus the workflow tools:
plan_dynamic_workflow()— plan any workflow from natural languagediscover_capabilities()— list all available agents, tools, skills, playbooks
What's Next
- AnimateDiff integration — animated scene backgrounds instead of static images
- WorkUnitRegistry auto-registration — MCTS discovers video production capabilities at runtime
- Multi-agent collaboration — Iris + Saga (narrative) + Lyra (research) feeding VideoDirector simultaneously
- New Remotion compositions — cinematic transitions, chapter cards, lower-thirds
The commit is 6ca5afd4. The system can now go from "make a video about X" to a fully produced, narrated, quality-checked MP4 — with humans approving every permanent action along the way.