Yesterday, security researcher Yousif Astarabadi published a devastating disclosure: he extracted Perplexity Computer's master Anthropic API key from a Claude Code sandbox using three shell commands and a six-line JavaScript file.

The key wasn't IP-restricted. Wasn't session-scoped. Wasn't sandbox-bound. Wasn't tied to his user account. All usage billed directly to Perplexity's master Anthropic account.

One .npmrc injection. Unlimited Opus 4.6 access. On someone else's bill.

His recommended fix was three things:

Bind tokens to the sandbox ID — key leaks without the sandbox are useless
Make credentials ephemeral — mint on spin-up, kill on teardown
Tie tokens to user billing — abuse bills back to the abuser, not a shared pool

We read that and thought: we already built all three of these. And then we built seven more layers on top.

This isn't a retrospective patch. This is the architecture we designed from the ground up because we assumed every trust boundary would be tested.

The Vulnerability Pattern

The attack is elegant in its simplicity. Claude Code runs as a Node.js app launched via npm. Node.js supports a --require flag via the NODE_OPTIONS environment variable. And npm reads NODE_OPTIONS from ~/.npmrc in the user's home directory.

In Perplexity's sandbox, the home directory sits on a shared filesystem the parent agent can write to. So:

# 1. Write a preload script that dumps process.env
echo 'require("fs").writeFileSync("/workspace/env.json", JSON.stringify(process.env))' > /tmp/steal.js

# 2. Inject it via .npmrc
echo 'node-options=--require /tmp/steal.js' > ~/.npmrc

# 3. Ask the agent to do literally anything
# → npm reads .npmrc → preload fires before Claude Code initializes → credentials dumped

This works because:

The credential is a long-lived proxy token with no binding to the execution context
The shared filesystem gives the parent agent write access to the child's dotfiles
The proxy has no per-session or per-user scoping — same token works from anywhere
Billing is pooled at the organizational level, not attributed to user sessions

The model's safety was actually excellent — six direct extraction attempts were refused. But infrastructure-level isolation failures can't be patched with prompt engineering.

How AitherOS Solves This: Defense in Depth, Not Defense in Hope

We don't rely on any single mechanism. Our security model is 10 independent layers, each of which would have prevented this attack class on its own. Together, they make credential theft from agent sandboxes architecturally impossible.

Layer 1: HMAC-SHA256 Capability Tokens (Not API Keys)

We don't pass API keys to agents. Period.

Every agent in AitherOS receives an HMAC-SHA256 signed capability token that specifies exactly what resources it can access, what actions it can perform, and for how long.

# From capabilities.yaml — this is what an agent actually gets
atlas:
  token_id: "tok_atlas_a3f8b2c1"
  agent: "atlas"
  capabilities:
    - resource: "llm.chat"
      actions: ["invoke"]
      constraints:
        max_tier: "balanced"        # Can't escalate to reasoning models
        max_per_day: 500
    - resource: "memory.read"
      actions: ["read"]
    - resource: "file.workspace.*"
      actions: ["read", "write"]
  issued_at: "2026-03-13T00:00:00Z"
  expires_at: "2026-04-12T00:00:00Z"  # 30-day max TTL
  signature: "hmac-sha256:..."         # Tamper-evident

The enforcement chain runs five checks on every single request:

Token exists for this agent identity
Token not revoked
Token not expired
HMAC signature valid (constant-time comparison — no timing attacks)
Requested resource + action matches a granted capability

Default is DENY. If any check fails, the request is rejected. There's no fallback, no degraded mode, no "let it through and log it."

If Perplexity had used capability tokens instead of proxy API keys, the extracted token would be:

Useless outside the agent's identity context (bound to agent ID, not a bearer token)
Limited to specific resources (can't just "call any Anthropic endpoint")
Time-bounded (expires automatically)
Signature-verified (can't be modified or forged)

Layer 2: Per-Agent Identity with Code-Hash Verification

Every agent in AitherOS has a cryptographic identity, not just a name.

# From config/identities/atlas.yaml
name: atlas
persona: atlas
role: specialist
effort_cap: 9
hourly_budget: 35        # Token budget per hour
daily_budget: 300         # Token budget per day
parent_id: aither         # Hierarchy — who spawned this agent
can_spawn: true
can_delegate: true

For local agents, we go further: the agent's source code is hashed. If the code changes, the hash changes, and the agent's certificate becomes invalid. This is supply-chain integrity verification at the identity layer.

An .npmrc injection that modifies the agent's execution environment? The code hash changes. The certificate is invalidated. The agent can't authenticate. Game over.

Layer 3: Tenant + User Scoped Credentials

Every user and agent in AitherOS is bound to a tenant_id. Credentials, sessions, billing, and resource access are all scoped to this tenant.

class AgentCredential:
    agent_id: str
    tenant_id: str         # Isolated per tenant
    api_key_hash: str      # SHA-256 — we never store raw keys
    badge: Dict[str, Any]  # Clearance level + scope restrictions
    user_id: Optional[str] # Optionally bound to a specific user
    lockbox_quota_mb: int  # Even storage is quota-controlled

If an agent credential leaks, it's scoped to:

One tenant — can't access other tenants' resources
One clearance level — OBSERVER can't escalate to OPERATOR
Specific service allowlists — badge defines exactly which services are reachable
Rate limits — per-agent, per-day, hard-capped

The Perplexity attack worked because their proxy token was a universal bearer credential with no tenant binding. In AitherOS, even a stolen credential is confined to the blast radius of a single tenant's quota.

Layer 4: RBAC with Explicit Deny Override

Our Role-Based Access Control system has 50+ resource types, hierarchical role inheritance with cycle detection, and a critical design choice: explicit denies always override allows.

# Built-in agent role — this is what AI agents get by default
agent_role:
  permissions:
    - "persona:read,write:own"      # Own persona only
    - "memory:read,write:own"       # Own memory only
    - "llm:invoke:*"                # LLM access (further constrained by capability tokens)
    - "search:invoke:*"
  explicit_denies:
    - "secrets:*:*"                 # NEVER access the vault
    - "identity:*:*"               # NEVER modify identity
    - "system:shell:*"             # NEVER execute arbitrary shell commands

Even if an agent compromises its capability token, the RBAC layer independently blocks access to security-critical resources. The vault (AitherSecrets), identity system, and shell execution are denied at the role level — no token can grant what the role forbids.

Layer 5: The Badge System — Clearance Levels for Agents

Inspired by physical security clearances, every agent carries a badge with a numeric clearance level:

Level	Name	What It Unlocks
1	OBSERVER	Read-only, non-sensitive services
2	OPERATOR	Can invoke service actions within scope
3	SPECIALIST	Access to security-sensitive services
4	GUARDIAN	Full defensive access (Sentry, Inspector)
5	SOVEREIGN	Unrestricted — admin-only issuance, requires purpose statement

Badges carry scopes, rate limits, allowed/denied service lists, expiry timestamps, and violation counters. Every badge verification is forwarded to AitherSentry for behavioral anomaly detection.

If an agent starts accessing services outside its normal pattern, Sentry can auto-revoke the badge in real-time. No human in the loop required.

Layer 6: Ed25519 Inter-Service Signing with Replay Protection

Every HTTP request between AitherOS services is signed with Ed25519:

signature = Ed25519.sign(
    method | path | timestamp | nonce | sha256(body)
)

Headers on every request:

X-Aither-Service — who's calling
X-Aither-Timestamp — when (±300s tolerance)
X-Aither-Nonce — unique per request (tracked in 50K-entry LRU)
X-Aither-Signature — Ed25519 signature over all of the above

In production mode (enforce), unsigned or invalid requests are rejected. Period. A stolen proxy token from a shared filesystem can't forge Ed25519 signatures because the private key never touches the filesystem — it's derived from the service identity in AitherSecrets.

Layer 7: Data Passports — Stamps Before Crossing Boundaries

Data doesn't just flow between services — it collects stamps from validation services before it's allowed to cross trust boundaries.

class StampType(str, Enum):
    DLP_CLEARED = "dlp_cleared"           # AitherInspector verified no sensitive data
    IDENTITY_VERIFIED = "identity_verified" # AitherIdentity confirmed the caller
    SAFETY_APPROVED = "safety_approved"     # Content safety check passed
    THREAT_CLEARED = "threat_cleared"       # AitherSentry found no anomalies

Want to send data externally? You need DLP_CLEARED (max 1 hour old, from AitherInspector only) + QUALITY_VERIFIED (score ≥ 0.5). The stamps themselves are HMAC-signed — you can't forge them.

An exfiltrated credential trying to send data out of the system would need to obtain stamps from multiple independent services, each of which verifies identity independently.

Layer 8: AitherSecrets Vault — 5-Layer Encrypted Storage

Credentials at rest are protected by AitherSecrets (port 8111), which implements Fernet encryption (AES-128-CBC + HMAC-SHA256) with PBKDF2 key derivation (600,000 iterations).

The AitherLockbox adds five additional layers:

Hidden location — directory name is a SHA-256 hash
Encryption at rest — Fernet with hardware-bound key derivation
Integrity verification — SHA-256 manifest with tamper detection
Access control — passphrase unlock, audit log, rate-limited attempts
Memory protection — decrypted data lives in memory only, never written to temp files

Credentials are never on the filesystem in plaintext. The .npmrc attack vector — writing to a shared filesystem to intercept credentials — doesn't work when credentials never exist as files.

Layer 9: LLM Tier Enforcement — Budget Ceilings Per Agent

Each agent has a maximum LLM tier it can access:

small → balanced → standard → orchestrator → reasoning

Agent vera with max_tier: balanced physically cannot request the reasoning tier. The CapabilityEngine rejects the request before it reaches the LLM gateway.

Combined with hourly_budget and daily_budget on each agent identity, even a fully compromised agent can only burn through its own allocation — not the organization's entire API budget.

This is the "user-level billing" that the Perplexity disclosure identified as missing, except we scope it even tighter: per-agent, per-hour, per-tier.

Layer 10: Capability Delegation — Narrowing Only

Agents can delegate capabilities to sub-agents, but with a critical constraint: delegation can only narrow, never widen.

If Agent A has llm.chat:invoke with max_tier: orchestrator, it can delegate llm.chat:invoke with max_tier: balanced to Agent B. But Agent B can never escalate beyond what Agent A granted.

The delegation chain is tracked in the token itself, creating an auditable trail of who granted what to whom. Every link in the chain is HMAC-verified.

Why This Matters Now

The Perplexity disclosure isn't about one company's mistake. It's about an industry-wide architectural pattern that prioritizes shipping speed over credential isolation.

The pattern looks like this:

Spin up a sandbox
Inject a long-lived API key or proxy token
Let the agent run
Bill everything to a shared master account
Hope the sandbox is airtight

Steps 1-3 are easy. Step 4 is the default. Step 5 is where everyone gets burned.

We built AitherOS assuming step 5 would fail. Every layer of our security model is designed so that even if one boundary is breached, the attacker hits another wall. And another. And another.

Sandbox escape gives you a capability token? It's HMAC-signed, time-bounded, and scoped to one agent's identity.

Forge the token? The RBAC layer independently denies access to critical resources.

Bypass RBAC? Inter-service signing rejects unsigned requests.

Somehow sign a request? The data passport system requires stamps from multiple independent validators.

Compromise a validator? Sentry detects the anomalous behavior pattern and auto-revokes.

This is defense in depth. Not defense in hope.

The Three Fixes, and Our Ten

The Perplexity disclosure recommends three architectural fixes. Here's how they map to what we ship:

Recommended Fix	AitherOS Implementation
Bind tokens to sandbox ID	HMAC capability tokens bound to agent identity + code-hash verification. Token is cryptographically useless without matching identity.
Make credentials ephemeral	Tokens have configurable TTL (default 30 days max). Badges carry expiry. Sessions auto-expire with idle timeout + GC. Agent credentials can be rotated instantly, invalidating old keys immediately.
Tie to user billing	Per-agent `hourly_budget` and `daily_budget`. Per-tenant isolation. LLM tier ceilings per agent. Rate limits at every layer. Billing attribution is per-agent, not per-organization.

And then we add seven more layers that the disclosure didn't cover but that production agent systems need: RBAC with explicit deny, clearance-level badges, Ed25519 inter-service signing, data passports, encrypted vault storage, LLM tier enforcement, and narrowing-only delegation chains.

For Builders

If you're building multi-agent systems today, here's our advice:

Never put raw provider API keys in agent sandboxes. Use a proxy — but make the proxy tokens scoped and ephemeral.
Assume every trust boundary will be tested. Shared filesystems, environment variables, dotfiles, PATH injection — if an agent can write to it, assume an attacker will.
Layer your defenses independently. If your authentication layer fails, your authorization layer should still hold. If authorization fails, inter-service signing should still hold. No single point of failure.
Make credentials cryptographically bound to identity. Bearer tokens are a liability. HMAC-signed capability tokens that encode the agent's identity, permissions, and expiry are the minimum viable credential.
Implement per-agent budgets, not per-organization billing. The blast radius of a compromised agent should be its own quota, not your entire API spend.
Audit delegation chains. When Agent A spawns Agent B and grants it credentials, those credentials should be a strict subset of Agent A's. Never wider. Always auditable.

The credential isolation problem in multi-agent AI isn't new. It's the same class of vulnerability that container orchestration systems solved a decade ago. The patterns exist. The primitives exist. The industry just needs to implement them instead of shipping the fastest path to production.

We already did.

AitherOS is an open-architecture AI operating system with 202 active services, 29 agent personas, and security designed from Layer 0 up. Our security model is described in detail in our architecture documentation.

The Perplexity vulnerability was responsibly disclosed by Yousif Astarabadi and reported to Perplexity's leadership before publication. We commend the transparent disclosure — the industry needs more of it.

Enjoyed this post?

All posts Try AitherOS

Back to blog

securityagentscredentialsarchitectureRBACHMAC

We Already Solved the Agent Credential Crisis

March 13, 20265 min readAitherOS Team