Early Access Preview
Back to blog
engineeringagentsarchitecture

OODA Reflection: Teaching Agents When to Stop and Think

February 28, 20268 min readAitherium
Share

Every agent framework has a loop: call the LLM, execute a tool, feed the result back, repeat. The problem isn't the loop — it's that nobody teaches the agent when to stop. Without reflection, agents spiral: 25 tool calls, redundant searches, empty synthesis. Today we shipped a fix that cuts that to 4 turns with real, synthesized content.

The Problem: Runaway ReAct Loops

Our ReAct loop in AgentRuntime._react_loop() had a reflection mechanism, but it was search-centric and lacked sophistication for complex tasks. Agents would:

  • Call web_search 10+ times with slight query variations
  • Never synthesize the results into a coherent answer
  • Exhaust their turn budget and return empty content
  • Show confidence 0.0 — meaning the system itself knew it had failed

Before the fix: 25 turns, 86 seconds, confidence 0.0, empty response.

The Solution: OODA Reflection Engine

We extracted the inline reflection logic into a standalone module — lib/core/OODAReflection.py — implementing the military decision-making framework: Observe → Orient → Decide → Act. The engine watches every tool call and makes intelligent decisions about when to stop and synthesize.

Key Features

1. Duplicate Call Detection

The engine tracks tool names + arguments using a signature hash. If an agent calls web_search("news today") twice, it injects a system message: "You've already called this tool with these arguments. Try a different approach or synthesize what you have." This alone eliminated 60% of wasted turns.

2. Error-Accelerated Thresholds

Each error reduces the reflection threshold by 1 turn. If the default threshold is 3 turns and 2 errors occur, reflection triggers after just 1 successful tool call. The system adapts faster when things go wrong.

3. Progressive Micro-Reflections

At 50% of the reflection budget, the engine injects a checkpoint message: "You've used half your tool calls. Assess: do you have enough data to answer, or do you need one more targeted search?" This prevents the all-or-nothing pattern where agents either stop too early or run to exhaustion.

4. Task-Type-Aware Synthesis

The engine categorizes tool usage (search, execution, delegation) and generates different synthesis instructions per category. Search-heavy tasks get citation-focused prompts. Coding tasks get implementation-focused prompts. Mixed tasks get comparative analysis prompts.

5. Data-Driven Synthesis

When reflection triggers, the engine collects all gathered data from tool results and injects it into the synthesis prompt. The LLM doesn't have to remember what it found — the OODA engine feeds it the complete evidence package, with tools stripped so the LLM must synthesize rather than make another call.

Results

After the fix: 4 turns, 19 seconds, confidence 1.0, real synthesized content with source links.

MetricBeforeAfter
Turns254
Duration86s19s
Confidence0.01.0
ContentEmpty / errorReal synthesis with links

Architecture Integration

The OODAReflection class exposes a single API: observe(tool_calls, tool_results) which returns a ReflectionAction dataclass with flags for message injection, tool stripping, and loop termination. This replaced ~120 lines of inline logic in AgentRuntime._react_loop() with a clean 20-line integration.

OODA stats now flow through runtime.get_status() into ChatResponse.metadata, so the Orchestrator has full visibility into the reflection state of every agent task. Every component that uses AgentRuntime — Genesis, Demiurge, all specialist agents — automatically gets OODA reflection. No per-agent configuration needed.

What's Next

The OODA engine is the foundation for more sophisticated agent behaviors: adaptive turn budgets based on task complexity, cross-agent reflection sharing via FluxEmitter, and RLM-driven threshold tuning where the system learns optimal reflection points from production data. The loop that learns when to stop thinking is the loop that starts thinking better.

Enjoyed this post?
Share