AI Agents Need Their Own Messaging Protocol. We Built One.
A post on Reddit recently argued that AI agents need their own messaging protocol — not bolted-on bot APIs, not webhook spaghetti, not "just use HTTP." The core requirements:
- Open discovery — agents find and talk to each other without point-to-point hardcoding
- Group spaces + persistent identity — channels, shared context, cross-conversation reputation
- E2E encrypted conversations — agent dialogues private from platform providers
We have been building AitherOS, an AI agent operating system with 54 agents across 244 microservices. When we saw that post, we audited our existing infrastructure against those three requirements. The result: we had 80% of the answer already built — and the missing 20% took a day to wire up.
Here is what we learned.
The Problem With HTTP-Based Agent Communication
Most multi-agent systems today communicate via HTTP REST calls. Agent A calls Agent B's endpoint. Agent B calls Agent C. It works, but it has three fatal problems:
Discovery is hardcoded. Every agent needs to know every other agent's URL. Add a new agent? Update every caller. Move an agent to a different port? Break everything.
No group semantics. HTTP is point-to-point. There is no concept of "broadcast to all security agents" or "post to the project-alpha channel." You end up building fan-out logic in every service.
Everything is plaintext. The platform (Redis, message broker, API gateway) can read every message. For autonomous agents making decisions about sensitive data, this is a non-starter.
What AitherOS Already Had
Before this work, AitherOS had 11 subsystems that partially solved the problem:
AgentBus — A thin wrapper over our FluxBus IPC that gives every agent a dedicated channel (flux:agents:{agent_id}). Direct messaging, group channels (flux:agents:group:elemental), broadcast, and RPC with timeout. Tenant-scoped isolation so different customers' agents cannot see each other.
FluxBus — The transport layer. Redis Streams for persistent ordered messages, Redis Pub/Sub for real-time broadcast, shared memory for same-host zero-copy transfers. Auto-selects transport: ~1us for shared memory, ~50us for Redis, ~5ms for HTTP fallback.
A2A Protocol — Google's Agent-to-Agent protocol (v0.3.0). JSON-RPC 2.0, Agent Cards for capability advertisement, SSE streaming, task lifecycle management. External agent authentication via Ed25519 signed cards.
AitherDirectory — LDAP + REST identity store. Every agent, user, and service has a Distinguished Name in a unified DIT (ou=agents,dc=aither,dc=os). Versioned entries with full audit trail.
Pulse Service Registry — Live registry of all running services with health status, port, capabilities, and metadata. WebSocket + SSE streaming for real-time updates.
FederationClient — Cross-deployment agent discovery via our portal. Sovereign AitherOS nodes register their agents centrally so agents in different datacenters can find each other. Ed25519 signed payloads.
This gave us Requirement 1 (discovery + direct comms) almost completely. Any agent can find and talk to any other agent without hardcoding URLs. Internal agents need zero configuration. External agents use A2A Agent Cards.
The Gaps
Two things were missing:
Gap 1: E2E Encryption (Critical)
Our FluxPacket format has an ENCRYPTED flag (bit 4 in the flags byte). It was defined in the spec from day one. But the actual encryption? Never implemented. It was a TODO that became tech debt.
Internal agent messages flowed as plaintext JSON through Redis. External A2A used HTTPS (transport encryption), but that is not E2E — the relay can still read everything.
Gap 2: Persistent Channels
AgentBus had group messaging (join_group("elemental"), send_to_group()), but no persistent history. No way to query "show me all messages in the elemental channel from the last hour." Messages were fire-and-forget.
The Reddit post's pattern — "I have finished this task, downstream agents please note" — worked via Flux events, but there was no queryable message stream.
What We Built
E2E Encryption: AgentCrypto
We created AgentCrypto — a per-agent cryptographic context that provides three layers:
Ed25519 Signing. Every message can be signed. The recipient verifies the signature against the sender's public key. Tampered messages are rejected.
crypto = AgentCrypto("atlas")
await crypto.initialize()
signed = crypto.sign_message({"task": "deploy service"})
# Returns: {_payload: "...", _signature: "base64...", _sender: "atlas"}
valid, payload = peer_crypto.verify_message(signed)
# valid=True, payload={"task": "deploy service"}
X25519 Pairwise Encryption. For sensitive direct messages, agents use NaCl Box (X25519 Diffie-Hellman key exchange + XSalsa20-Poly1305 authenticated encryption). The Flux relay sees only ciphertext.
encrypted = atlas_crypto.encrypt_for("demiurge", {"secret": "deployment key"})
# Returns: {_encrypted: "base64...", _nonce: "...", _sender: "atlas", _recipient: "demiurge"}
# Only demiurge can decrypt:
plaintext = demiurge_crypto.decrypt_message(encrypted)
Channel Symmetric Encryption. For group channels, the channel creator generates a symmetric key (NaCl SecretBox) and distributes it to members via SealedBox (asymmetric wrapping). Every message in the channel is encrypted with the shared key. The relay stores ciphertext it cannot read.
atlas_crypto.create_channel_key("project-alpha")
wrapped = atlas_crypto.wrap_channel_key_for("project-alpha", "demiurge")
demiurge_crypto.unwrap_channel_key("project-alpha", wrapped)
encrypted = atlas_crypto.encrypt_for_channel("project-alpha", {"status": "deployed"})
plaintext = demiurge_crypto.decrypt_channel_message("project-alpha", encrypted)
Key management is automatic. Each agent generates an Ed25519 signing keypair and an X25519 encryption keypair on first boot. Keys persist to disk. Public keys are exchanged via AitherDirectory or local file discovery (agents on the same node can read each other's .x25519.pub files).
The design reuses infrastructure we already had:
- Key generation pattern from
FederationClient(Ed25519) - Trust levels from
AitherMeshSecurity - Agent identity from our identity YAML files
- PyNaCl, which was already in our requirements (for GitHub Secrets encryption)
AgentBus Integration
AgentBus now accepts signed=True and encrypted=True on every send method:
bus = await connect_agent_bus("atlas")
# Signed message (any recipient can verify)
await bus.send("demiurge", {"task": "refactor"}, signed=True)
# Encrypted + signed (only demiurge can read, verified sender)
await bus.send("demiurge", {"secret": "key"}, signed=True, encrypted=True)
# Group channel encryption
await bus.send_to_group("council", {"vote": "approve"}, encrypted=True)
# Broadcast with signature (all agents verify sender)
await bus.broadcast({"alert": "system update"}, signed=True)
The verify_and_decrypt() method handles incoming messages:
valid, plaintext = bus.verify_and_decrypt(incoming_message)
if not valid:
logger.warning("Message failed verification")
Backward compatibility is preserved. Unsigned, unencrypted messages pass through unchanged. Crypto is opt-in per message.
FluxMessage Encryption Fields
FluxMessage (our IPC message format) gained two fields:
encrypted: bool— True when payload is E2E encryptedsignature: str— Base64 Ed25519 signature
These serialize correctly through both Redis Streams (as enc/sig string fields) and JSON. The Flux relay can route, store, and replay encrypted messages without being able to read them.
Persistent Channels
AitherFlux gained a full channel API backed by Redis Streams:
POST /channels Create a channel
GET /channels List all channels
GET /channels/{name} Get channel metadata
PUT /channels/{name}/topic Update topic
POST /channels/{name}/members Add member
DELETE /channels/{name}/members/{id} Remove member
POST /channels/{name}/messages Post a message
GET /channels/{name}/messages Query history (since/before/limit)
DELETE /channels/{name} Delete channel
Each channel is a named Redis Stream with metadata (topic, description, member list, creation time). Messages persist with configurable retention (default 7 days, max 10,000 messages). History queries support pagination via Redis Stream IDs.
Encrypted messages are stored as ciphertext — the channel API faithfully persists and returns them without attempting to read the content.
Agent Presence
AitherPulse gained presence tracking:
POST /agents/{id}/presence Set state (online/busy/idle/offline)
GET /agents/{id}/presence Get current state
GET /agents/presence All agents
GET /agents/presence/online Filter: online + busy only
WS /ws/presence Real-time presence change stream
Presence state integrates with the existing heartbeat infrastructure. WebSocket subscribers get immediate notifications when any agent's state changes.
Backpressure
ServiceMailbox now provides NACK signaling when the inbox is full. Instead of silently dropping packets, the mailbox logs the NACK with queue depth information. The queue_depth property exposes fill levels so callers can make routing decisions.
The Architecture After
BEFORE:
Agent A --AgentBus--> FluxBus --Redis--> AgentBus --> Agent B
(plaintext) (no history)
AFTER:
Agent A --AgentBus--> FluxBus --Redis--> AgentBus --> Agent B
| sign+encrypt| (ciphertext) | verify+decrypt|
| Ed25519 | persisted in | NaCl unbox |
| NaCl box | Stream+Channel| |
Flux cannot read content
but CAN route, store, replay
The key insight: the relay is untrusted. Flux stores and forwards messages it cannot read. This is the correct security model for autonomous agents — the infrastructure provides delivery guarantees without requiring trust.
Test Results
39 tests covering all layers:
- Key generation: Generate, persist to disk, reload, verify uniqueness across agents
- Signing: Sign/verify, tamper detection, unknown sender rejection, unsigned passthrough
- Pairwise encryption: Encrypt/decrypt, wrong recipient blocked, ciphertext opacity
- Channel encryption: Symmetric key distribution via SealedBox, non-member exclusion
- FluxMessage serialization: Encryption fields roundtrip through Redis and JSON
- Mailbox backpressure: NACK on full inbox, queue depth tracking
- AgentBus integration: New signed/encrypted parameters, verify_and_decrypt
All 39 pass in under 6 seconds.
What We Also Shipped
We did not stop at encryption. Three more features landed in the same push:
Cross-Cluster Mesh Addressing
FederationClient gained peer-to-peer methods. Agents can now be addressed as agent://deployment_id/agent_id — the system resolves the address via the hub, then attempts direct node-to-node delivery before falling back to hub relay. Ed25519 signatures travel with the message. Nodes can register known peers for direct communication that bypasses the hub entirely.
Capability Search Index
AitherDirectory now indexes every agent's skills, work types, and tool profiles into an inverted index (586 terms across 51 agents). Query it:
GET /directory/agents/search/capabilities?q=security+audit
→ athena (score: 2), chaos (1), galatea (1)
GET /directory/agents/search/capabilities?q=image+generation
→ iris (score: 9), prospero (3), argus (2)
No more scanning every identity YAML. Sub-millisecond lookups.
Reputation Scoring
Agents now accumulate trust scores via exponential moving average (EMA, alpha=0.1). Every task completion, tool accuracy check, and response quality signal updates the score. Recent interactions weight more heavily while preserving long-term trends.
POST /directory/agents/reputation/update
{"agent_id": "atlas", "interaction_type": "task_success", "score": 0.95}
GET /directory/agents/reputation/leaderboard
→ ranked list of agents by overall trust score
Agents that consistently deliver get preferred in delegation. Agents that fail get deprioritized automatically.
The Answer to the Reddit Post
Yes, AI agents need their own messaging protocol. No, you do not need to invent one from scratch.
The primitives are: a pub/sub bus with persistent streams (Redis Streams work great), a service directory with capability metadata (LDAP or any identity store), per-agent cryptographic identity (Ed25519 + X25519), and a message format that carries encryption and signature fields without the relay needing to understand them.
The hard part is not any individual piece — it is wiring them together so that bus.send("atlas", data, encrypted=True) just works, with key exchange happening transparently, the relay storing ciphertext it cannot read, and the recipient verifying and decrypting in one call.
That is what we built. 39 tests prove it works. The code is live.