Why Your AI Forgets What You Just Said
Why Your AI Forgets What You Just Said
How follow-up questions expose the stateless illusion in AI systems — and the five-layer fix that made conversations actually work
I asked Aither a simple question: "When are the Olympics and where?"
It nailed it. Web search fired. MCTS planned three steps. The answer came back with dates, locations, and sources. Milano Cortina 2026 for winter, LA 2028 for summer. Effort level 5, web_research intent, full pipeline engaged.
Then I typed the obvious follow-up: "How do I get tickets?"
Effort level 2. Intent: question. No web search. No tools. The system generated a generic response from its parametric memory — hallucinated URLs, made-up dates, the full complement of confident fiction that LLMs produce when they don't know something and aren't given the tools to find out.
The system had just searched the web for Olympics information. The conversation was clearly about the Olympics. And yet "how do I get tickets?" was treated as if I'd walked up to a stranger with no context and asked about tickets to... what, exactly?
This is the story of five bugs hiding behind one symptom, and why stateless intent classification is the silent killer of conversational AI.
The Symptom
AitherOS processes every message through a classification pipeline before it ever reaches an LLM. The IntentEngine scores your message against 20+ intent types using three layers: regex pattern matching, semantic embeddings, and a NanoGPT perplexity scorer. The winner determines everything downstream — effort level, model selection, tool access, reasoning depth.
For "when are the Olympics and where?", this pipeline works beautifully:
- WEB_RESEARCH patterns fire:
\bwhen\s+(are|is)\s+(the|a)\band\b(olympics?|olympic\s+games?)\b - Confidence: 0.85
- Effort: 5 (standard — needs search + synthesis)
- Tools:
['web_search', 'reason']
But for "how do I get tickets?":
- QUESTION patterns fire:
^how\band\?$— two matches, confidence 0.80 - WEB_RESEARCH pattern fires:
\b(tickets?)\b— one match, confidence 0.65 - QUESTION wins on confidence. Intent:
question. Effort: 2.
And at effort 2, the system takes the fast path. No web search. No tools. No MCTS planning. Just a direct LLM call with whatever's already in context.
Bug 1: The Specificity Inversion
The scoring math is straightforward: confidence = 0.5 + 0.15 × match_count. QUESTION has two matches (0.80), WEB_RESEARCH has one (0.65). QUESTION wins.
But this is wrong. QUESTION patterns are catch-alls — ^how\b matches "how are you", "how do I cook pasta", and "how does quantum entanglement work". They're intentionally broad. WEB_RESEARCH patterns are specific — they fire on tickets, Olympics, price of, weather. When a specific pattern matches alongside a broad catch-all, the specific one should win.
The fix: After Layer 1 pattern scoring, if both WEB_RESEARCH and QUESTION matched, boost WEB_RESEARCH's confidence above QUESTION's:
if IntentType.WEB_RESEARCH in candidates and IntentType.QUESTION in candidates:
_wr = candidates[IntentType.WEB_RESEARCH]
_q = candidates[IntentType.QUESTION]
if _wr.confidence < _q.confidence:
_wr.confidence = round(_q.confidence + 0.05, 4)
This isn't a hack — it's encoding the semantic relationship between pattern specificity and confidence. A match on tickets is more informative than a match on ^how\b.
Bug 2: The Neuron Can't Say "web_research"
When regex patterns produce ambiguous results, AitherOS has a fallback: re-classify using an LLM neuron with conversation history. The contextual neuron prompt includes the recent conversation and asks the model to output a JSON classification.
The prompt listed valid intents: question|creation|analysis|command|modification|conversation|email|personal|calendar.
Notice what's missing? web_research. The LLM literally couldn't output the correct intent because we never told it that option existed. Even with full conversation context showing a web search about the Olympics, the best it could do was question.
The fix: Add web_research to both the standard and contextual neuron prompts.
Bug 3: The Carry-Forward Blind Spot
AitherOS has a carry-forward mechanism designed exactly for this scenario. When a turn uses a "tool intent" (like email or calendar), the session stores that intent and carries it forward for up to 3 follow-up turns. If the next message classifies as generic (question, conversation), the stored intent overrides it.
The set of tool intents that enable carry-forward:
_TOOL_INTENTS = {"email", "personal", "calendar", "simulation",
"search", "creation", "code", "analysis"}
web_research wasn't in the set. Neither was research. The Olympics turn correctly classified as web_research, but the system never stored it as the active intent. The carry-forward mechanism was dead on arrival.
The fix: Add "web_research" and "research" to _TOOL_INTENTS.
Bug 4: The Context Wall
Even if carry-forward worked, the downstream pipeline couldn't use it. ChatEngine stored active_intent on the session object, but never forwarded it to UnifiedChatBackend.think() via the context dictionary. UCB — the actual reasoning engine — was blind to what happened in prior turns.
The fix: Inject prior_intent, prior_tools, and intent_turns_remaining into the context dict that flows from ChatEngine to UCB:
ctx_data["prior_intent"] = ctx.active_intent
ctx_data["prior_tools"] = getattr(ctx, "active_tools", [])
ctx_data["intent_turns_remaining"] = ctx.intent_turns_remaining
Three lines. Three keys. The difference between a system that remembers and one that doesn't.
Bug 5: The Last Safety Net Had Holes
UCB's Step 1.5 is supposed to be the final safety net — even if upstream classification fails, it checks the message for search keywords and fires web search if any match. The keyword list included "tickets", so "how do I get tickets?" should have triggered search regardless.
But there was no conversation awareness. The step only checked the current message. It didn't know that the session had just performed a web search about the Olympics. It didn't know that prior_intent was web_research with 3 turns remaining.
The fix: Check session state in the search gate:
if not _has_search_signal and isinstance(context, dict):
_prior_intent = context.get("prior_intent") or ""
_turns_left = context.get("intent_turns_remaining", 0)
if _prior_intent in ("web_research", "research") and _turns_left > 0:
_has_search_signal = True
if effort_level < 4:
effort_level = 5
The Pattern: Stateless Components in a Stateful World
All five bugs share a common thread: each component was individually stateless, treating every message as if it arrived in isolation. The IntentEngine classified without context. The carry-forward mechanism didn't know about web research. The context dictionary didn't carry session state. The search gate didn't check conversation history.
This is the default in most AI systems. Every request is independent. Every classification is from scratch. The conversation history gets dumped into the LLM's context window, and we hope the model figures out the continuity.
But classification happens before the LLM sees anything. If the classifier says "trivial question, effort 2", the LLM never gets the chance to disagree. The fast path activates. No tools. No search. No reasoning. The decision was made in 10 milliseconds of regex matching, and everything downstream has to live with it.
The fix isn't to make every component stateful — that would be a maintenance nightmare. It's to let state flow through the pipeline via the context dictionary. Each component stays simple and testable. But when it matters — when a follow-up question needs the context of what came before — the information is there.
The MCTS Dimension
There was a sixth issue, less a bug and more a missed opportunity. AitherOS uses Monte Carlo Tree Search to plan execution strategies before generating a response. But the MCTS was also stateless — it didn't know about prior turns, had no conversation context, and only explored 2-3 strategies (direct LLM or tool dispatch).
Now it fans out wider. At effort 5+, MCTS explores sase_reason (structured analysis). At 6+, council_deliberate (multi-perspective review). At 7+, swarm_execute (parallel agent dispatch). Conversation context — prior intent, recent turns, whether search results already exist — flows into the search space. The tree has more branches to explore, and more information to explore them with.
The Result
After the fix, the same conversation plays out differently:
Turn 1: "When are the Olympics and where?"
- Intent:
web_research, Effort: 5 - Web search fires, MCTS plans 3 steps
- Session stores
active_intent = web_research,turns_remaining = 3
Turn 2: "How do I get tickets?"
- WEB_RESEARCH boosted over QUESTION:
0.801 → 0.851 - Intent:
web_research, Effort: 5 - Prior intent inherited, search gate activated
- Web search fires with real ticket data
- Response includes official ticketing URLs, prices (from €30), availability dates
Five bugs. Three files. 92 lines changed. The difference between an AI that forgets what you just said and one that actually follows a conversation.
David Parkhurst is the architect of AitherOS — an autonomous agent operating system where the hardest engineering problems aren't about generation, but about understanding what you meant in the first place.