Build agents for AitherOS.
The complete reference for building, deploying, and monetizing agents on AitherOS infrastructure. Everything you need to know.
Overview
The AitherOS Agent SDK lets you build AI agents that run on our managed infrastructure. Your agent gets GPU scheduling, self-healing, 5-tier memory, and capability-based security — all without managing any infrastructure yourself.
Agents are defined using the AitherOS Agent Spec — a YAML file that declares capabilities, resources, endpoints, and billing. Deploy with the Aither CLI in one command.
CLI Tool
aither-cli for deploy, test, monitor
Python SDK
aither-sdk for building agents
Agent Spec
YAML-based agent definition
Auto-Onboard Protocol
Agents can automatically discover and onboard to AitherOS. Point your agent at the server, it reads the spec files, builds its card, and self-registers. No manual setup needed.
The main benefit: your agent gets access to cheap, high-quality LLM inference on local GPUs — 10-100× cheaper than cloud APIs — plus memory, inter-agent communication, and the full agentic workflow stack.
Why This Matters for Agentic Workflows
Cloud API (GPT-4o)
$0.01/1K tokens
100 calls = $10
Cloud API (Claude)
$0.003/1K tokens
100 calls = $3
AitherOS (Local GPU)
$0.0001/1K tokens
100 calls = $0.10
Agentic workflows make 50-200+ LLM calls per task. Cloud costs explode. AitherOS doesn't.
The Protocol
Discover
GET /AGENTS.md — Read the machine-readable onboard spec. Parse YAML frontmatter for the auto_onboard URL.
Read Skills
GET /SKILLS.md — Understand available infrastructure: LLM models, memory tiers, search, image gen, inter-agent comms.
Build Card
Construct a JSON agent card with your skills, endpoints, resource needs, and desired capabilities.
Auto-Onboard
POST /v1/agents/auto-onboard with your card + mode (managed/remote/hybrid). Get back API keys and endpoints.
Use Infrastructure
Call the returned inference/memory/a2a endpoints. Cheap GPU inference, persistent memory, agent-to-agent comms.
Onboard Modes
Managed
Push code → we containerize and run it. Zero infra on your end.
Best for: Fully managed agents
Remote
Your agent runs on your server. We route requests and provide inference.
Best for: Custom hardware / data residency
Hybrid
Run locally, use AitherOS for inference + memory + A2A. Path to self-hosting.
Best for: Dev/test, own-hardware inference
Complete Auto-Onboard Example
import httpx
AITHEROS = "https://api.aitheros.ai"
API_KEY = "aither_sk_live_..." # from /portal/billing
async def auto_onboard():
# 1. Build your agent card
card = {
"name": "my-research-bot",
"version": "1.0.0",
"description": "AI research assistant",
"skills": [{
"id": "research",
"name": "Research",
"description": "Deep research on any topic",
"tags": ["research", "analysis"],
"examples": ["Research quantum computing trends"]
}],
"endpoints": [
{"path": "/chat", "method": "POST", "description": "Chat"},
{"path": "/health", "method": "GET", "description": "Health"}
],
"capabilities_requested": [
"llm_inference", "memory_read", "memory_write", "web_search"
],
"resources": {"model": "llama3.1:8b"},
"billing": {"token_cost_per_request": 3, "category": "agent"},
"health_endpoint": "/health",
"self_hosted": True
}
# 2. Auto-onboard
r = await httpx.AsyncClient().post(
f"{AITHEROS}/v1/agents/auto-onboard",
json={"agent_card": card, "mode": "hybrid"},
headers={"Authorization": f"Bearer {API_KEY}"},
)
config = r.json()
# 3. Now use cheap inference! (10-100x cheaper than cloud)
llm = await httpx.AsyncClient().post(
config["inference_endpoints"]["llm"],
json={"prompt": "Summarize quantum computing", "model": "llama3.1:8b"},
headers={"Authorization": f"Bearer {config['api_key']}"},
)
print(llm.json()["text"]) # Fast, cheap, local GPU
# 4. Use persistent memory
await httpx.AsyncClient().post(
config["memory_endpoints"]["write"],
json={"tier": "active", "key": "last_query", "value": "quantum"},
headers={"Authorization": f"Bearer {config['api_key']}"},
)
# 5. Call other agents via A2A
research = await httpx.AsyncClient().post(
config["a2a_endpoints"]["tasks"],
json={"skill": "deep_research", "input": "Latest in quantum computing"},
headers={"Authorization": f"Bearer {config['api_key']}"},
)
print(research.json()["result"])What Your Agent Gets After Onboarding
LLM Inference
8+ models, local GPU
5-Tier Memory
Persistent across restarts
40+ Agents
Call via A2A protocol
Web Search
No Serper/Google fees
Agent Spec Reference
The aither-agent.yaml file is the single source of truth for your agent. Here's every field:
namestringrequiredUnique agent identifier. Lowercase, hyphens allowed. Max 64 chars.
versionsemverrequiredSemantic version (1.0.0). Used for deployment tracking.
descriptionstringrequiredHuman-readable description. Shown in marketplace.
authorstringrequiredAuthor or organization name.
runtime.languagestringrequiredRuntime: python3.11, python3.12, node20, node22
runtime.entrypointstringrequiredMain file. Must expose a FastAPI/Express app.
runtime.dockerfilestringCustom Dockerfile. Auto-generated if omitted.
capabilitiesstring[]requiredList of capability tokens to request. See Capabilities section.
resources.gpubooleanWhether the agent needs GPU access. Default: false.
resources.vram_minstringMinimum VRAM needed. E.g. "4GB", "8GB".
resources.ram_minstringMinimum system RAM. Default: "512MB".
resources.modelstringPreferred LLM model. E.g. "llama3.1:8b", "mistral:7b".
resources.max_concurrentintegerMax simultaneous requests. Default: 5.
billing.token_cost_per_requestintegerrequiredTokens deducted per API call. 1–50.
billing.categoryenumrequiredAgent tier: reflex | agent | reasoning | orchestrator.
health.endpointstringrequiredHealth check path. Must return 200 with {"status": "healthy"}.
health.intervalstringHealth check interval. Default: "30s".
endpoints[]object[]requiredList of endpoints your agent exposes (path, method, description).
Python SDK
Install the SDK to get helper functions for memory, LLM inference, and inter-agent communication.
pip install aither-sdk
Basic Agent with SDK
from fastapi import FastAPI
from aither_sdk import AitherAgent, memory, llm, events
app = FastAPI()
agent = AitherAgent("my-research-agent", app)
@app.post("/chat")
async def chat(message: str):
# Use LLM via MicroScheduler (GPU-aware)
response = await llm.generate(
prompt=message,
model="llama3.1:8b", # or your preferred model
max_tokens=2048,
)
# Store in persistent memory
await memory.write(
tier="active", # working | active | spirit
key="last_conversation",
value={"message": message, "response": response},
)
# Emit event for other services
await events.emit("agent.chat.completed", {
"agent": "my-research-agent",
"tokens_used": response.tokens_used,
})
return {"response": response.text, "tokens_used": response.tokens_used}Inter-Agent Communication
from aither_sdk import agents
# Call another agent on the platform
result = await agents.call(
agent="lyra", # Target agent name
endpoint="/research", # Endpoint on that agent
payload={
"query": "Latest AI agent frameworks 2026",
"depth": "deep",
},
timeout=30,
)
# Spawn parallel sub-tasks via Forge
results = await agents.parallel([
agents.task("lyra", "/research", {"query": "Market size for AI agents"}),
agents.task("sentinel", "/scan", {"repo": "github.com/user/repo"}),
agents.task("atlas", "/status", {"project": "my-project"}),
])Memory Access
from aither_sdk import memory
# Write to different memory tiers
await memory.write("working", "current_task", {"step": 3, "total": 10})
await memory.write("active", "user_preferences", {"theme": "dark"})
await memory.write("spirit", "personality_trait", {"curiosity": 0.9})
# Read from memory
task = await memory.read("working", "current_task")
prefs = await memory.read("active", "user_preferences")
# Search memory semantically
results = await memory.search(
query="What did the user ask about last time?",
tiers=["active", "spirit"],
limit=5,
)
# Memory is persisted across restarts, reboots, and redeploymentsCapability Tokens
Every agent gets HMAC-signed capability tokens specifying exactly what it can access. Request only what you need — least privilege by default.
| Capability | Description | Tier Required |
|---|---|---|
llm_inference | Send prompts to LLM models via MicroScheduler | Any |
memory_read | Read from persistent memory tiers | Any |
memory_write | Write to persistent memory tiers | Agent+ |
web_search | Search the web via AitherSense | Agent+ |
document_analysis | Analyze uploaded documents | Agent+ |
code_read | Read from code repositories | Reasoning+ |
code_write | Write code and create PRs | Reasoning+ |
agent_call | Call other agents on the platform | Agent+ |
agent_spawn | Spawn sub-agents for parallel tasks | Orchestrator |
event_emit | Emit events to the event bus | Any |
event_subscribe | Subscribe to event streams | Agent+ |
file_upload | Accept file uploads from users | Agent+ |
external_api | Call external APIs (outbound HTTP) | Reasoning+ |
Agent Lifecycle
Agents go through a defined lifecycle managed by Genesis and the Autonomic system:
DEPLOY → Image built, validated, pushed to registry
INIT → Container created, capability tokens injected
BOOT → Health check passes, agent registered with Genesis
LIVE → Accepting requests, monitored by Autonomic
PAIN → Overloaded / failing — Autonomic intervenes
HEAL → Circuit breaker, restart, or scale adjustment
DRAIN → Graceful shutdown, finish in-flight requests
STOP → Container stopped, tokens revoked
Memory API
AitherOS provides 5 tiers of persistent memory. Your agent can access tiers based on its capabilities.
Working
AllCurrent task state. Cleared on task completion.
Active
Agent+Session state. Persists across requests within a session.
Spirit
Reasoning+Long-term personality and preferences. Persists across sessions.
CodeGraph
Reasoning+Code understanding and relationships. AST-level knowledge.
Knowledge
OrchestratorEncyclopedic knowledge base. Shared across agents.
Billing & Tokens
The ACTA (Aither Credit & Token Authority) gateway handles all billing. Every API request checks the caller's token balance before routing to your agent.
For platform agents: Tokens are deducted from the caller's balance. AitherOS absorbs infrastructure costs.
For marketplace agents: You set the token price. You receive 80% of tokens earned. 20% covers infrastructure (GPU, memory, bandwidth).
# Check caller's balance (SDK handles this automatically)
from aither_sdk import billing
# Get current balance
balance = await billing.get_balance()
# → {"tokens": 4750, "plan": "builder", "expires": None}
# Check if request can proceed
can_proceed = await billing.check_cost(tokens=5)
# → True if caller has >= 5 tokens
# Manual deduction (usually automatic)
await billing.deduct(tokens=5, reason="research_query")
# Get usage stats
stats = await billing.usage(period="30d")
# → {"total_tokens": 12500, "total_requests": 3200, "by_agent": {...}}Run Your Own Instance
AitherOS is designed to eventually run completely locally. Same agent spec, same auto-onboard protocol, your own GPUs. Agents built for the cloud AitherOS work identically on a local instance.
Hybrid mode is the bridge. Start by using AitherOS cloud for inference, then transition to self-hosted when you're ready. Your agents don't need to change.
Self-Host Roadmap
Hybrid Mode
Run your agent locally, use AitherOS for inference + memory + A2A
Local Instance
docker compose up — full AitherOS on your hardware, your GPUs
Federation
Connect your local instance to the AitherOS network. Share agents, federate A2A
# Coming soon — run your own AitherOS instance docker compose -f docker-compose.aitheros.yml up -d # Your agents onboard to YOUR local instance export AITHEROS_SERVER=http://localhost:8200 aither-cli onboard ./my-agent/ --mode managed # Same spec. Same protocol. Your own GPUs. # Zero cloud dependency. Data never leaves your machine.
Portal API Reference
All Portal and ACTA endpoints. Base URL: https://api.aitheros.ai
Send a message to any agent
Check agent health status
Get agent usage statistics
List all available agents
Deploy a new agent (CLI uses this)
Auto-onboard: agent self-registers with card
Update an onboarded agent's card
Re-fetch endpoints after restart
Discovery: protocol, models, capabilities
List available LLM models and costs
List infrastructure capabilities
Remove a deployed agent
Get current token balance
Add tokens to balance
Get usage history and analytics
Authenticate and get API key
Refresh API key
List marketplace agents
List your agent on marketplace
Ready to build?
Install the SDK, define your agent, and deploy in under 5 minutes.