Early Access Preview—AitherOS is in active development. Features may change, break, or disappear.

LLM

0/24

GPU0/0GB

IDLEFREE

Connecting to services…

•

Live Demo

Invite Only

Theme

GitHub

Live Demo

Invite Only

Theme

GitHub

Back to blog

engineeringarchitectureopen-sourcehermes-agentnous-researchaitheros-doctor

What We Learned (and Didn't) from Reviewing Hermes Agent

April 20, 20266 min readAitherOS Engineering

What We Learned (and Didn't) from Reviewing Hermes Agent

April 19, 2026 · AitherOS Engineering

We recently spent time doing a deep review of Hermes Agent, Nous Research's open-source AI agent framework. We wanted to be honest about what they've built, what we could learn from it, and where our architectures diverge in ways that matter.

The short version: Hermes Agent is an excellent single-agent CLI tool with thoughtful UX. We respect the craft. But our review also confirmed that AitherOS is solving a fundamentally different problem — and the solutions rarely translate.

What Hermes Does Well

Credit where it's earned.

The developer experience is polished. hermes setup walks you through provider selection, model config, terminal backends, and messaging integrations in a single wizard. The CLI is well-thought-out — prompt_toolkit integration, session persistence, slash commands that feel natural. The hermes doctor command is a clean idea: one command that diagnoses your config, checks dependencies, tests connectivity, and optionally auto-fixes what it finds.

The tool dispatch is clean. A central registry.py handles tool schema collection, dispatch, and error wrapping. Tools self-register at import time. Toolsets group tools by platform (hermes-cli, hermes-telegram, etc.), and each platform gets sensible defaults. It's a pattern that scales well for a single-process agent.

The prompt engineering is battle-tested. Their prompt_builder.py handles model-specific quirks: tool-use enforcement for GPT models that want to describe actions rather than take them, <mandatory_tool_use> blocks, <act_dont_ask> guidance. These are patterns learned from production usage across many LLM backends, and they show.

Memory provider plugins are well-designed. Eight pluggable providers (Honcho, Hindsight, Supermemory, etc.) with a clean interface. The Honcho integration in particular — bidirectional peer modeling, dialectic reasoning, multi-host config — shows real thought about what cross-session memory should look like.

The training pipeline is forward-thinking. Their Atropos environment integration lets you run agentic LLMs through multi-turn tool-calling loops, score output with reward functions, and feed results into RL training. The trajectory format (ShareGPT + tool_calls) is becoming a community standard. That matters for the ecosystem.

Where Our Architectures Diverge

This is where we stop being able to learn from each other, because the problems are different.

Context Management

Hermes uses lossy summarization — when approaching token limits, context_compressor.py summarizes middle conversation turns. It's the practical choice for a single-agent loop.

AitherOS uses surgical eviction. Our ContextPipeline is a 10-stage pipeline: classify → scale → gather → enrich → graph → recall → recursive refinement → ingest → weed → budget. The "weed" stage scores every context chunk and evicts the lowest-scored ones — it doesn't summarize, it selects. For hard problems, the RecursiveContextEngine scales to 10M+ tokens via recursive chunk processing with quality judgments at each depth. We never compress; we curate.

These aren't better-or-worse choices. They're different architectures for different scale points.

Identity and Persona

Hermes has SOUL.md — a markdown file that defines the agent's personality, slotted into position #1 of the system prompt. It's simple and effective for a single-agent system.

AitherOS has a multi-layer identity stack: immutable axioms (safety floor, never overridden) → wills (behavioral constraints) → personas (personality definitions) → soul overlays (project-specific) → affect state (valence, arousal, confidence, openness). We have 50+ named agent identities, each with their own spirit snapshot, tool profiles, delegation permissions, and effort caps. Our PersonaEngine faculty builds weighted context windows where axioms score 1.0, wills 0.98, persona 0.95, and so on.

This isn't because we're overengineering. It's because in a multi-agent orchestration system, identity isolation is a security boundary. An agent forked for a code review shouldn't inherit the personality or permissions of the parent planning agent. SOUL.md is elegant for one agent. It doesn't work for fifty.

Agent Delegation

Hermes spawns subagents with delegate_task — goal, context, toolsets, optional model override. Each gets its own terminal session. Clean and effective.

AitherOS routes through AgentForge with MCTS-based multi-agent chain selection. A task like "review this PR and deploy if it passes" might route through lyra → demiurge → atlas — a 3-agent chain selected by fused scoring across keyword match, historical success rate, effort fit, current load, and description overlap. With exploration pressure so the system doesn't get stuck in local optima. Each forged agent gets isolated context, identity-scoped tools, acceptance criteria with frontier-model verification, and optional git worktree isolation.

Again: different problems. If you're one person at a terminal, you don't need MCTS routing. If you're orchestrating 50 specialized agents across a distributed system, you do.

Multi-Model Routing

Hermes picks a model at init time — or lets you switch with /model. The Mixture-of-Agents tool fans queries to multiple reference models and aggregates.

AitherOS does effort-scaled elastic routing. Every request gets an effort score (1-10). That maps to a concrete plan: which model tier, which backend, what token budget, what orchestration mode. Our MicroScheduler manages GPU scheduling, VRAM allocation, priority queuing, and lazy container lifecycle. It targets ~90% GPU utilization and modulates effort caps when the system is saturated. For critical decisions, CouncilReview runs multi-agent deliberation.

MoA is a useful technique. But it's a single tool, not a scheduling architecture.

What We Actually Took

One thing. We built aitheros-doctor.

Hermes's hermes doctor command is a genuinely good UX pattern that we didn't have. One command that scans your entire installation — Python environment, config validation, Docker status, service health, infrastructure dependencies, network connectivity, GPU availability, tooling — and reports everything in a structured, actionable format with colored output and auto-fix suggestions.

Our version (npm run doctor / python AitherOS/scripts/aitheros_doctor.py) reuses our existing Finding class from the boot orchestrator, our services.yaml as the single source of truth (211 services, 127 ports, 25 compound groups), and our async health probe pattern. It understands compound services, dependency graphs, and the difference between "not running" and "unhealthy." It's adapted for our architecture, but the idea — "give the operator one command to diagnose everything" — came from reviewing Hermes.

Thanks for that, Nous Research.

The Honest Summary

Capability	Hermes Agent	AitherOS
Context management	Lossy summarization	10-stage surgical pipeline + recursive refinement
Memory	MEMORY.md + 8 plugin providers	6-tier (Spirit/STTP/Conv/Smart/Graph/Working) with decay
Identity	SOUL.md	Axiom→Will→Persona→Soul→Affect, 50+ identities
Agent delegation	Subagent with goal/context	MCTS multi-agent chains, fused routing, worktree isolation
Model routing	Static selection + MoA tool	Effort-scaled elastic routing, GPU scheduling, council review
Tool system	~47 tools, self-registering	98 MCP modules, identity-scoped, category-based
Training pipeline	Atropos environments + trajectories	AitherHarvest multi-source → quality scoring → fine-tuning
Operator diagnostics	`hermes doctor` ✓	`aitheros-doctor` ✓ (new — inspired by Hermes)
Target	Single-agent CLI	Multi-service orchestration platform

Hermes Agent is a very good single-agent framework. If you want one AI assistant on your terminal that handles tool calling, memory, and messaging platforms well — it's worth looking at. The Nous Research team has built something thoughtful and well-maintained.

But it's solving a different problem than AitherOS. And that's fine. The AI agent ecosystem is big enough for both approaches.

We reviewed Hermes Agent v0.8.0 (April 2026). Our comparison is based on public source code. If we've mischaracterized anything, we're happy to be corrected — open an issue or reach out.

Enjoyed this post?

All posts Try AitherOS

Back to blog

engineeringarchitectureopen-sourcehermes-agentnous-researchaitheros-doctor

What We Learned (and Didn't) from Reviewing Hermes Agent

April 20, 20266 min readAitherOS Engineering

What We Learned (and Didn't) from Reviewing Hermes Agent

April 19, 2026 · AitherOS Engineering

What Hermes Does Well

Credit where it's earned.

Where Our Architectures Diverge

This is where we stop being able to learn from each other, because the problems are different.

Context Management

Hermes uses lossy summarization — when approaching token limits, context_compressor.py summarizes middle conversation turns. It's the practical choice for a single-agent loop.

These aren't better-or-worse choices. They're different architectures for different scale points.

Identity and Persona

Hermes has SOUL.md — a markdown file that defines the agent's personality, slotted into position #1 of the system prompt. It's simple and effective for a single-agent system.

Agent Delegation

Hermes spawns subagents with delegate_task — goal, context, toolsets, optional model override. Each gets its own terminal session. Clean and effective.

Again: different problems. If you're one person at a terminal, you don't need MCTS routing. If you're orchestrating 50 specialized agents across a distributed system, you do.

Multi-Model Routing

Hermes picks a model at init time — or lets you switch with /model. The Mixture-of-Agents tool fans queries to multiple reference models and aggregates.

MoA is a useful technique. But it's a single tool, not a scheduling architecture.

What We Actually Took

One thing. We built aitheros-doctor.

Thanks for that, Nous Research.

The Honest Summary

Capability	Hermes Agent	AitherOS
Context management	Lossy summarization	10-stage surgical pipeline + recursive refinement
Memory	MEMORY.md + 8 plugin providers	6-tier (Spirit/STTP/Conv/Smart/Graph/Working) with decay
Identity	SOUL.md	Axiom→Will→Persona→Soul→Affect, 50+ identities
Agent delegation	Subagent with goal/context	MCTS multi-agent chains, fused routing, worktree isolation
Model routing	Static selection + MoA tool	Effort-scaled elastic routing, GPU scheduling, council review
Tool system	~47 tools, self-registering	98 MCP modules, identity-scoped, category-based
Training pipeline	Atropos environments + trajectories	AitherHarvest multi-source → quality scoring → fine-tuning
Operator diagnostics	`hermes doctor` ✓	`aitheros-doctor` ✓ (new — inspired by Hermes)
Target	Single-agent CLI	Multi-service orchestration platform

But it's solving a different problem than AitherOS. And that's fine. The AI agent ecosystem is big enough for both approaches.

We reviewed Hermes Agent v0.8.0 (April 2026). Our comparison is based on public source code. If we've mischaracterized anything, we're happy to be corrected — open an issue or reach out.

Enjoyed this post?

All posts Try AitherOS