Early Access Preview
Back to blog
engineeringgpu3dcreative

Text to NPC in 4 Minutes: Building a Production 3D Character Pipeline

March 4, 202611 min readAitherium
Share

Imagine typing "a gothic sorceress with flowing dark robes and mystical glowing eyes" and four minutes later having a fully textured, PBR-lit 3D model inside your Godot game — complete with a state machine, navigation, and dialogue signals. No Blender. No manual retopology. No UV unwrapping. Just text in, NPC out.

That's what we built this week. Here's the full story of how we replaced a broken ComfyUI workflow with a production-grade 5-stage pipeline that chains four AitherOS services together.

The Problem: A 9-Node Workflow That Never Worked

Our original AitherMeshGen service tried to do everything in a single ComfyUI workflow: load the image, generate a 3D shape, apply textures, decimate the mesh, clean it, and export — all in one POST /prompt call with 9 interconnected nodes.

It never worked reliably. Two critical failures:

  • VRAM overflow: Hunyuan3D's ShapeGen model (~7GB) stays loaded in Python class-level caches even after POST /free. When TexGen tries to load its own models, there's not enough VRAM left, and the whole workflow crashes with a CUDA OOM.
  • Decimate corruption: The Decimate Mesh and Fast Clean Mesh nodes corrupt the vertex/UV correspondence. The textured mesh comes out with scrambled UVs — faces mapped to wrong texture regions, seams everywhere.

The E2E Test That Proved the Path

Before touching production code, we wrote a standalone end-to-end test (dev/scripts/test_e2e_3d_pipeline.py) that proved a 2-phase approach works:

  1. Phase 1 — ShapeGen (4 nodes, ~71s): LoadImage → LoadShapeGenPipeline → ShapeGen → SaveMesh
  2. Container restart to clear model caches from VRAM
  3. Re-upload the image (restart clears ComfyUI's input directory)
  4. Phase 2 — TexGen (4 nodes, ~158s): LoadImage → LoadTexGenPipeline → TexGen → SaveMesh

Total: ~4.3 minutes for a 4.1MB PBR-textured GLB with correct UV mapping. No Decimate, no FastClean. The mesh comes out clean because we never corrupt it in the first place.

The critical insight: TexGen's mesh_path must be output/{shape_filename}, not a node reference. TexGen resolves paths relative to /app (the container's cwd), so the shape mesh from Phase 1 lives at output/shape_xyz.glb.

Phase 1: Rewriting MeshGen

With the E2E proof in hand, we rewrote AitherMeshGen.py to use the proven 2-phase approach. The key architectural decision: extract workflows as pure functions for testability.

def _build_shapegen_workflow(uploaded_name, shape_filename):
    """4-node ShapeGen workflow — zero side effects."""
    return {
        "1": {"class_type": "LoadImage",
              "inputs": {"image": uploaded_name}},
        "2": {"class_type": "[Comfy3D] Load Hunyuan3D 21 ShapeGen Pipeline",
              "inputs": {"subfolder": "hunyuan3d-dit-v2-1"}},
        "3": {"class_type": "[Comfy3D] Hunyuan3D 21 ShapeGen",
              "inputs": {"image": ["1", 0],
                         "pipeline": ["2", 0], ...}},
        "4": {"class_type": "[Comfy3D] Save 3D Mesh",
              "inputs": {"mesh": ["3", 0],
                         "save_path": shape_filename}},
    }

The TexGen builder follows the same pattern. Both are pure dict-returning functions — no I/O, no state, trivially testable. The actual _generate_hunyuan3d() function orchestrates the lifecycle:

  1. Acquire a VRAM slot from MicroScheduler
  2. Stop competing GPU containers (ComfyUI image gen, vLLM workers)
  3. Upload the portrait image to ComfyUI-3D
  4. Submit ShapeGen workflow, poll until complete
  5. Restart the aitheros-comfyui-3d container
  6. Re-upload the image (restart clears /input)
  7. Submit TexGen workflow, poll until complete
  8. Download the final GLB, restart GPU containers, release VRAM slot

We also added a POST /generate/stream SSE endpoint that emits real-time events as each phase progresses — phase_start, phase_complete, container_restart, error, done.

Phase 2: The Character Pipeline Orchestrator

MeshGen handles the GPU-heavy 3D generation. But a full NPC needs more than a mesh. CharacterPipeline orchestrates five stages, each calling a different AitherOS service:

StageServiceWhat It DoesFallback
1. DescriptionSagaGenerates rich narrative descriptionRaw character data
2. PortraitIris / CanvasGenerates 2D portrait from descriptionUser-provided image
3. 3D MeshMeshGenHunyuan3D 2-phase generationPortrait-only result
4. Godot ExportGodotExporterGenerates .tscn scene fileSkip
5. GDScriptGDScript GeneratorNPC controller with state machineSkip

Every stage degrades gracefully. If Saga is down, we use the raw description. If Iris fails, we fall back to Canvas. If MeshGen fails, you still get the portrait and description. The pipeline never crashes — it just produces a less complete result.

The pipeline emits FluxEmitter events at each stage (CHAR3D_START, CHAR3D_SAGA, CHAR3D_PORTRAIT, etc.), which feed into the system-wide event bus for monitoring, alerting, and UI updates.

Genesis exposes the pipeline as REST endpoints:

POST /character-pipeline/generate        # Async — returns session_id
POST /character-pipeline/generate/sync   # Blocking — returns full result
GET  /character-pipeline/status/{id}     # Poll session
GET  /character-pipeline/stream/{id}     # SSE timeline
GET  /character-pipeline/sessions        # List recent

Phase 3: Godot Scene Export + GDScript Generation

A GLB file isn't a game character. It needs to be placed in a Godot scene with collision, navigation, and behavior. We enhanced GodotSceneExporter to accept a mesh_path parameter:

[gd_scene load_steps=3 format=3]

[ext_resource type="PackedScene" path="res://assets/models/witch_abc.glb" id="1"]

[sub_resource type="CapsuleShape3D" id="2"]
radius = 0.4
height = 1.8

[node name="Witch" type="CharacterBody3D"]
metadata/character_id = "witch_001"
metadata/species = "human"

[node name="MeshRoot" type="Node3D" parent="."]
[node name="Model" type="MeshInstance3D" parent="MeshRoot"]
metadata/source_mesh = "witch_abc.glb"

[node name="CollisionShape3D" type="CollisionShape3D" parent="."]
transform = Transform3D(1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0.9, 0)
shape = SubResource("2")

For GDScript, we built a template-based generator that produces complete NPC controllers. Each NPC gets:

  • A state machine (IDLE, WALKING, RUNNING, INTERACTING)
  • NavigationAgent3D setup for pathfinding
  • Species-aware speed (elves are faster, dwarves are slower)
  • Behavior modules — patrol points, merchant shop, hostile aggro
  • Signal emissions for GodotBridge state synchronization
  • @export vars so designers can tweak values in the Godot editor
# Auto-generated NPC controller — do not edit manually
extends CharacterBody3D

signal interaction_started(npc_id: String)
signal state_changed(new_state: int)
signal dialogue_requested(npc_id: String, topic: String)

enum State { IDLE, WALKING, RUNNING, INTERACTING }

@export var npc_id: String = "witch_001"
@export var display_name: String = "Dark Witch"
@export var walk_speed: float = 2.0
@export var run_speed: float = 5.0
@export var mesh_path: String = "witch.glb"

var current_state: int = State.IDLE

Phase 4: Prometheus Integration

Prometheus is AitherOS's world simulation engine. It manages game worlds, NPCs, and real-time state. We added a new endpoint that connects the character pipeline to the world:

POST /generate/character-3d/{world_id}/{character_id}

When called, Prometheus collects the NPC's data from the world coordinator, sends it to the character pipeline, and when the mesh is ready, broadcasts a mesh_available event to all connected Godot clients via WebSocket. The game instantly picks up the new model without a restart.

There's an opt-in auto-3D mode that automatically triggers mesh generation for key NPC roles (heroes, villains, merchants, guards) when they enter the world for the first time.

Phase 5: Agent Tools + Frontend

The Saga agent — AitherOS's narrative AI — can now generate 3D characters directly from conversation. Ask it to "create a guard captain for the eastern gate" and it will compose the description, generate the portrait, build the mesh, and produce the Godot scene file. All five pipeline stages, triggered by natural language.

The AitherVeil frontend's Forge3D Studio now has a "Local Pipeline" option alongside the cloud-hosted Forge3D service. The local pipeline uses your own GPU — no data leaves your machine.

The Numbers

MetricValue
Total pipeline time~4.3 minutes (ShapeGen 71s + restart 30s + TexGen 158s)
Output quality4.1MB PBR-textured GLB with correct UVs
New source files7 (orchestrator, router, GDScript generator, 4 test files)
Modified files14
New tests86 (all passing)
GPU requiredRTX 3090+ (24GB+ VRAM recommended)
Services orchestrated4 (Saga, Iris/Canvas, MeshGen, GodotExporter)

What We Learned

Container restarts are not a hack — they're architecture. Python class-level caches in ML frameworks (transformers, diffusers) don't respect torch.cuda.empty_cache() or ComfyUI's /free endpoint. The model weights live in class attributes that survive everything except process termination. The container restart between ShapeGen and TexGen isn't a workaround; it's the correct way to reclaim VRAM when the framework won't cooperate.

Never post-process 3D output from image-to-3D models. Decimate, remesh, and clean operations assume the input mesh has consistent topology. Image-to-3D models like Hunyuan3D produce meshes with unusual topology that breaks these assumptions. The UV mapping that looks perfect before decimation becomes garbage after. Skip the post-processing; the raw output is good enough for real-time rendering.

Graceful degradation turns a fragile pipeline into a reliable one. Each of our five stages can fail independently. When Saga is down, we use raw text. When MeshGen fails, we still deliver the portrait. The pipeline's value degrades linearly with failures instead of going to zero at the first exception.

What's Next: Automated Rigging

A static mesh is one thing. An animated character is another. The next frontier is automated rigging — taking the raw GLB from Hunyuan3D and adding a skeleton, weight painting, and basic animation clips (idle, walk, run, interact) that work with Godot's animation system. We're evaluating RigNet for skeleton prediction and Mixamo-compatible bone naming for animation retargeting. Stay tuned.

Enjoyed this post?
Share