OODA Reflection: Teaching Agents When to Stop and Think
Every agent framework has a loop: call the LLM, execute a tool, feed the result back, repeat. The problem isn't the loop — it's that nobody teaches the agent when to stop. Without reflection, agents spiral: 25 tool calls, redundant searches, empty synthesis. Today we shipped a fix that cuts that to 4 turns with real, synthesized content.
The Problem: Runaway ReAct Loops
Our ReAct loop in AgentRuntime._react_loop() had a reflection mechanism, but it was search-centric and lacked sophistication for complex tasks. Agents would:
- Call
web_search10+ times with slight query variations - Never synthesize the results into a coherent answer
- Exhaust their turn budget and return empty content
- Show confidence 0.0 — meaning the system itself knew it had failed
Before the fix: 25 turns, 86 seconds, confidence 0.0, empty response.
The Solution: OODA Reflection Engine
We extracted the inline reflection logic into a standalone module — lib/core/OODAReflection.py — implementing the military decision-making framework: Observe → Orient → Decide → Act. The engine watches every tool call and makes intelligent decisions about when to stop and synthesize.
Key Features
1. Duplicate Call Detection
The engine tracks tool names + arguments using a signature hash. If an agent calls web_search("news today") twice, it injects a system message: "You've already called this tool with these arguments. Try a different approach or synthesize what you have." This alone eliminated 60% of wasted turns.
2. Error-Accelerated Thresholds
Each error reduces the reflection threshold by 1 turn. If the default threshold is 3 turns and 2 errors occur, reflection triggers after just 1 successful tool call. The system adapts faster when things go wrong.
3. Progressive Micro-Reflections
At 50% of the reflection budget, the engine injects a checkpoint message: "You've used half your tool calls. Assess: do you have enough data to answer, or do you need one more targeted search?" This prevents the all-or-nothing pattern where agents either stop too early or run to exhaustion.
4. Task-Type-Aware Synthesis
The engine categorizes tool usage (search, execution, delegation) and generates different synthesis instructions per category. Search-heavy tasks get citation-focused prompts. Coding tasks get implementation-focused prompts. Mixed tasks get comparative analysis prompts.
5. Data-Driven Synthesis
When reflection triggers, the engine collects all gathered data from tool results and injects it into the synthesis prompt. The LLM doesn't have to remember what it found — the OODA engine feeds it the complete evidence package, with tools stripped so the LLM must synthesize rather than make another call.
Results
After the fix: 4 turns, 19 seconds, confidence 1.0, real synthesized content with source links.
| Metric | Before | After |
|---|---|---|
| Turns | 25 | 4 |
| Duration | 86s | 19s |
| Confidence | 0.0 | 1.0 |
| Content | Empty / error | Real synthesis with links |
Architecture Integration
The OODAReflection class exposes a single API: observe(tool_calls, tool_results) which returns a ReflectionAction dataclass with flags for message injection, tool stripping, and loop termination. This replaced ~120 lines of inline logic in AgentRuntime._react_loop() with a clean 20-line integration.
OODA stats now flow through runtime.get_status() into ChatResponse.metadata, so the Orchestrator has full visibility into the reflection state of every agent task. Every component that uses AgentRuntime — Genesis, Demiurge, all specialist agents — automatically gets OODA reflection. No per-agent configuration needed.
What's Next
The OODA engine is the foundation for more sophisticated agent behaviors: adaptive turn budgets based on task complexity, cross-agent reflection sharing via FluxEmitter, and RLM-driven threshold tuning where the system learns optimal reflection points from production data. The loop that learns when to stop thinking is the loop that starts thinking better.