We just shipped one of the most ambitious orchestration features in AitherOS — VideoDirector, a 7-phase video production pipeline that coordinates five AI services, six specialized agents, and puts humans in the loop at every critical juncture.

The Problem

We already had the pieces: Iris for visual art direction, AitherCanvas wrapping ComfyUI for image generation, Remotion with 19 compositions for motion graphics, TTS via PerceptionMedia, and ffmpeg for post-production. But they were islands. Making a polished video meant manually chaining API calls, copying files between services, and praying the styles were consistent.

We wanted to say "make a narrated video about quantum computing" and have the system figure out the rest.

The Architecture

VideoDirector orchestrates production in seven phases:

Art Direction — Iris generates a VisualStyleGuide: color palette, mood, typography, background style, and ComfyUI style prompts. Every asset in the video follows this guide.
Asset Generation — Canvas/ComfyUI produces per-scene images and animations using the style guide. Each scene can specify custom prompts or let the director auto-generate them.
Quality Gate — Iris evaluates generated assets against a quality threshold. Below the bar? It feeds evaluation feedback back into Canvas for img2img refinement. Up to 2 refinement rounds per asset.
Scene Rendering — Each scene maps to a Remotion composition (MotionGraphics, PresentationDeck, DataVisualization, TutorialWalkthrough, etc.) and renders as an individual MP4 clip.
Narration — TTS synthesizes per-scene audio from the narration script.
Assembly — ffmpeg concatenates scene clips, overlays narration audio, and mixes in background music with configurable volume.
Publish — The final video is registered as a Strata artifact with full metadata.

The Unified Work Schema

To make this work, we needed every subsystem to speak the same language. Enter WorkUnit — the ONE canonical work package schema for AitherOS.

Previously we had 13+ incompatible task formats: TaskHub WorkPackages, AgentForge specs, A2A tasks, MCTS PlanNodes, Notebook cells, Expedition tasks, and Playbook steps. WorkUnit unifies them all with bidirectional adapters at every boundary.

Key design decisions:

Comms built in, not bolted on — every WorkUnit carries its own notification config (who to email, which channels, audit level)
Governance by default — approval_required, rollback_action, idempotent flags on every unit
DAG-native — depends_on for composition, sub_units for nesting
Status machine — PENDING → IN_PROGRESS → COMPLETED | FAILED | AWAITING_APPROVAL

Dynamic Workflow Planning

The DynamicWorkflowPlanner takes any natural language goal and plans a complete workflow:

DISCOVER — loads all available capabilities from WorkUnitRegistry (agents, tools, skills, playbooks, notebook templates)
DECOMPOSE — breaks the goal into sub-tasks using LLM (with heuristic fallback)
ASSIGN — matches sub-tasks to the best agent/tool using relevance scoring
WIRE — builds the dependency DAG with auto-inserted human gates before any dangerous operation
EMIT — outputs as a ComposableWorkflow or interactive Agent Notebook

The planner auto-inserts human approval gates before any action tagged as publish, broadcast, deploy, delete, or permanent. No AI action goes public without human sign-off.

The Agent Ensemble

The auto-video-marketing playbook demonstrates the full agent ensemble:

Agent	Role
Lyra	Deep topic research — trends, papers, narratives
Atlas	Content strategy — narrative arc, editorial direction
Vera	Narration scripting — TTS-optimized prose
Demiurge	Visual design — Remotion slide JSON, storyboarding
Hera	Social distribution — LinkedIn copy, publishing

Four human approval gates ensure the operator maintains full steering control:

After strategy (before scripting)
After script + visuals (before GPU rendering)
After video render (before social posting)
After social copy (before public publishing)

MCP Tools

Two new MCP tools expose VideoDirector to any agent or external system:

produce_video() — full production from a brief with structured scenes
plan_video_production() — goal → brief planning for review before execution

Plus the workflow tools:

plan_dynamic_workflow() — plan any workflow from natural language
discover_capabilities() — list all available agents, tools, skills, playbooks

What's Next

AnimateDiff integration — animated scene backgrounds instead of static images
WorkUnitRegistry auto-registration — MCTS discovers video production capabilities at runtime
Multi-agent collaboration — Iris + Saga (narrative) + Lyra (research) feeding VideoDirector simultaneously
New Remotion compositions — cinematic transitions, chapter cards, lower-thirds

The commit is 6ca5afd4. The system can now go from "make a video about X" to a fully produced, narrated, quality-checked MP4 — with humans approving every permanent action along the way.

Enjoyed this post?

All posts Try AitherOS

Back to blog

videoorchestrationagentsremotioncomfyuiirismctsworkunitpipeline

VideoDirector: How We Built a Multi-Agent Video Production Pipeline

April 21, 20264 min readDavid Parkhurst