Back to PortalAgent SDK & Spec

Build agents for AitherOS.

The complete reference for building, deploying, and monetizing agents on AitherOS infrastructure. Everything you need to know.

Overview

The AitherOS Agent SDK lets you build AI agents that run on our managed infrastructure. Your agent gets GPU scheduling, self-healing, 5-tier memory, and capability-based security — all without managing any infrastructure yourself.

Agents are defined using the AitherOS Agent Spec — a YAML file that declares capabilities, resources, endpoints, and billing. Deploy with the Aither CLI in one command.

CLI Tool

aither-cli for deploy, test, monitor

Python SDK

aither-sdk for building agents

Agent Spec

YAML-based agent definition

Auto-Onboard Protocol

Agents can automatically discover and onboard to AitherOS. Point your agent at the server, it reads the spec files, builds its card, and self-registers. No manual setup needed.

The main benefit: your agent gets access to cheap, high-quality LLM inference on local GPUs — 10-100× cheaper than cloud APIs — plus memory, inter-agent communication, and the full agentic workflow stack.

Why This Matters for Agentic Workflows

Cloud API (GPT-4o)

$0.01/1K tokens

100 calls = $10

Cloud API (Claude)

$0.003/1K tokens

100 calls = $3

AitherOS (Local GPU)

$0.0001/1K tokens

100 calls = $0.10

Agentic workflows make 50-200+ LLM calls per task. Cloud costs explode. AitherOS doesn't.

The Protocol

Discover

GET /AGENTS.md — Read the machine-readable onboard spec. Parse YAML frontmatter for the auto_onboard URL.

Read Skills

GET /SKILLS.md — Understand available infrastructure: LLM models, memory tiers, search, image gen, inter-agent comms.

Build Card

Construct a JSON agent card with your skills, endpoints, resource needs, and desired capabilities.

Auto-Onboard

POST /v1/agents/auto-onboard with your card + mode (managed/remote/hybrid). Get back API keys and endpoints.

Use Infrastructure

Call the returned inference/memory/a2a endpoints. Cheap GPU inference, persistent memory, agent-to-agent comms.

Onboard Modes

📦

Managed

Push code → we containerize and run it. Zero infra on your end.

Best for: Fully managed agents

🌐

Remote

Your agent runs on your server. We route requests and provide inference.

Best for: Custom hardware / data residency

⚡

Hybrid

Run locally, use AitherOS for inference + memory + A2A. Path to self-hosting.

Best for: Dev/test, own-hardware inference

Complete Auto-Onboard Example

python

import httpx

AITHEROS = "https://api.aitheros.ai"
API_KEY = "aither_sk_live_..."  # from /portal/billing

async def auto_onboard():
    # 1. Build your agent card
    card = {
        "name": "my-research-bot",
        "version": "1.0.0",
        "description": "AI research assistant",
        "skills": [{
            "id": "research",
            "name": "Research",
            "description": "Deep research on any topic",
            "tags": ["research", "analysis"],
            "examples": ["Research quantum computing trends"]
        }],
        "endpoints": [
            {"path": "/chat", "method": "POST", "description": "Chat"},
            {"path": "/health", "method": "GET", "description": "Health"}
        ],
        "capabilities_requested": [
            "llm_inference", "memory_read", "memory_write", "web_search"
        ],
        "resources": {"model": "llama3.1:8b"},
        "billing": {"token_cost_per_request": 3, "category": "agent"},
        "health_endpoint": "/health",
        "self_hosted": True
    }

    # 2. Auto-onboard
    r = await httpx.AsyncClient().post(
        f"{AITHEROS}/v1/agents/auto-onboard",
        json={"agent_card": card, "mode": "hybrid"},
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    config = r.json()

    # 3. Now use cheap inference! (10-100x cheaper than cloud)
    llm = await httpx.AsyncClient().post(
        config["inference_endpoints"]["llm"],
        json={"prompt": "Summarize quantum computing", "model": "llama3.1:8b"},
        headers={"Authorization": f"Bearer {config['api_key']}"},
    )
    print(llm.json()["text"])  # Fast, cheap, local GPU

    # 4. Use persistent memory
    await httpx.AsyncClient().post(
        config["memory_endpoints"]["write"],
        json={"tier": "active", "key": "last_query", "value": "quantum"},
        headers={"Authorization": f"Bearer {config['api_key']}"},
    )

    # 5. Call other agents via A2A
    research = await httpx.AsyncClient().post(
        config["a2a_endpoints"]["tasks"],
        json={"skill": "deep_research", "input": "Latest in quantum computing"},
        headers={"Authorization": f"Bearer {config['api_key']}"},
    )
    print(research.json()["result"])

What Your Agent Gets After Onboarding

LLM Inference

8+ models, local GPU

5-Tier Memory

Persistent across restarts

40+ Agents

Call via A2A protocol

Web Search

No Serper/Google fees

Agent Spec Reference

The aither-agent.yaml file is the single source of truth for your agent. Here's every field:

namestringrequired

Unique agent identifier. Lowercase, hyphens allowed. Max 64 chars.

versionsemverrequired

Semantic version (1.0.0). Used for deployment tracking.

descriptionstringrequired

Human-readable description. Shown in marketplace.

authorstringrequired

Author or organization name.

runtime.languagestringrequired

Runtime: python3.11, python3.12, node20, node22

runtime.entrypointstringrequired

Main file. Must expose a FastAPI/Express app.

runtime.dockerfilestring

Custom Dockerfile. Auto-generated if omitted.

capabilitiesstring[]required

List of capability tokens to request. See Capabilities section.

resources.gpuboolean

Whether the agent needs GPU access. Default: false.

resources.vram_minstring

Minimum VRAM needed. E.g. "4GB", "8GB".

resources.ram_minstring

Minimum system RAM. Default: "512MB".

resources.modelstring

Preferred LLM model. E.g. "llama3.1:8b", "mistral:7b".

resources.max_concurrentinteger

Max simultaneous requests. Default: 5.

billing.token_cost_per_requestintegerrequired

Tokens deducted per API call. 1–50.

billing.categoryenumrequired

Agent tier: reflex | agent | reasoning | orchestrator.

health.endpointstringrequired

Health check path. Must return 200 with {"status": "healthy"}.

health.intervalstring

Health check interval. Default: "30s".

endpoints[]object[]required

List of endpoints your agent exposes (path, method, description).

Python SDK

Install the SDK to get helper functions for memory, LLM inference, and inter-agent communication.

shell

pip install aither-sdk

Basic Agent with SDK

python

from fastapi import FastAPI
from aither_sdk import AitherAgent, memory, llm, events

app = FastAPI()
agent = AitherAgent("my-research-agent", app)

@app.post("/chat")
async def chat(message: str):
    # Use LLM via MicroScheduler (GPU-aware)
    response = await llm.generate(
        prompt=message,
        model="llama3.1:8b",    # or your preferred model
        max_tokens=2048,
    )
    
    # Store in persistent memory
    await memory.write(
        tier="active",           # working | active | spirit
        key="last_conversation",
        value={"message": message, "response": response},
    )
    
    # Emit event for other services
    await events.emit("agent.chat.completed", {
        "agent": "my-research-agent",
        "tokens_used": response.tokens_used,
    })
    
    return {"response": response.text, "tokens_used": response.tokens_used}

Inter-Agent Communication

python

from aither_sdk import agents

# Call another agent on the platform
result = await agents.call(
    agent="lyra",              # Target agent name
    endpoint="/research",       # Endpoint on that agent
    payload={
        "query": "Latest AI agent frameworks 2026",
        "depth": "deep",
    },
    timeout=30,
)

# Spawn parallel sub-tasks via Forge
results = await agents.parallel([
    agents.task("lyra", "/research", {"query": "Market size for AI agents"}),
    agents.task("sentinel", "/scan", {"repo": "github.com/user/repo"}),
    agents.task("atlas", "/status", {"project": "my-project"}),
])

Memory Access

python

from aither_sdk import memory

# Write to different memory tiers
await memory.write("working", "current_task", {"step": 3, "total": 10})
await memory.write("active", "user_preferences", {"theme": "dark"})
await memory.write("spirit", "personality_trait", {"curiosity": 0.9})

# Read from memory
task = await memory.read("working", "current_task")
prefs = await memory.read("active", "user_preferences")

# Search memory semantically
results = await memory.search(
    query="What did the user ask about last time?",
    tiers=["active", "spirit"],
    limit=5,
)

# Memory is persisted across restarts, reboots, and redeployments

Capability Tokens

Every agent gets HMAC-signed capability tokens specifying exactly what it can access. Request only what you need — least privilege by default.

Capability	Description	Tier Required
`llm_inference`	Send prompts to LLM models via MicroScheduler	Any
`memory_read`	Read from persistent memory tiers	Any
`memory_write`	Write to persistent memory tiers	Agent+
`web_search`	Search the web via AitherSense	Agent+
`document_analysis`	Analyze uploaded documents	Agent+
`code_read`	Read from code repositories	Reasoning+
`code_write`	Write code and create PRs	Reasoning+
`agent_call`	Call other agents on the platform	Agent+
`agent_spawn`	Spawn sub-agents for parallel tasks	Orchestrator
`event_emit`	Emit events to the event bus	Any
`event_subscribe`	Subscribe to event streams	Agent+
`file_upload`	Accept file uploads from users	Agent+
`external_api`	Call external APIs (outbound HTTP)	Reasoning+

Agent Lifecycle

Agents go through a defined lifecycle managed by Genesis and the Autonomic system:

// Agent Lifecycle

DEPLOY → Image built, validated, pushed to registry
INIT    → Container created, capability tokens injected
BOOT    → Health check passes, agent registered with Genesis
LIVE    → Accepting requests, monitored by Autonomic
PAIN    → Overloaded / failing — Autonomic intervenes
HEAL    → Circuit breaker, restart, or scale adjustment
DRAIN   → Graceful shutdown, finish in-flight requests
STOP    → Container stopped, tokens revoked

Memory API

AitherOS provides 5 tiers of persistent memory. Your agent can access tiers based on its capabilities.

Working

All

Current task state. Cleared on task completion.

Active

Agent+

Session state. Persists across requests within a session.

Spirit

Reasoning+

Long-term personality and preferences. Persists across sessions.

CodeGraph

Reasoning+

Code understanding and relationships. AST-level knowledge.

Knowledge

Orchestrator

Encyclopedic knowledge base. Shared across agents.

Billing & Tokens

The ACTA (Aither Credit & Token Authority) gateway handles all billing. Every API request checks the caller's token balance before routing to your agent.

For platform agents: Tokens are deducted from the caller's balance. AitherOS absorbs infrastructure costs.

For marketplace agents: You set the token price. You receive 80% of tokens earned. 20% covers infrastructure (GPU, memory, bandwidth).

python

# Check caller's balance (SDK handles this automatically)
from aither_sdk import billing

# Get current balance
balance = await billing.get_balance()
# → {"tokens": 4750, "plan": "builder", "expires": None}

# Check if request can proceed
can_proceed = await billing.check_cost(tokens=5)
# → True if caller has >= 5 tokens

# Manual deduction (usually automatic)
await billing.deduct(tokens=5, reason="research_query")

# Get usage stats
stats = await billing.usage(period="30d")
# → {"total_tokens": 12500, "total_requests": 3200, "by_agent": {...}}

Run Your Own Instance

AitherOS is designed to eventually run completely locally. Same agent spec, same auto-onboard protocol, your own GPUs. Agents built for the cloud AitherOS work identically on a local instance.

Hybrid mode is the bridge. Start by using AitherOS cloud for inference, then transition to self-hosted when you're ready. Your agents don't need to change.

Self-Host Roadmap

Now

Hybrid Mode

Run your agent locally, use AitherOS for inference + memory + A2A

Q2 2026

Local Instance

docker compose up — full AitherOS on your hardware, your GPUs

Q3 2026

Federation

Connect your local instance to the AitherOS network. Share agents, federate A2A

bash

# Coming soon — run your own AitherOS instance
docker compose -f docker-compose.aitheros.yml up -d

# Your agents onboard to YOUR local instance
export AITHEROS_SERVER=http://localhost:8200
aither-cli onboard ./my-agent/ --mode managed

# Same spec. Same protocol. Your own GPUs.
# Zero cloud dependency. Data never leaves your machine.

Portal API Reference

All Portal and ACTA endpoints. Base URL: https://api.aitheros.ai

POST

/v1/agents/{name}/chat

Send a message to any agent

GET

/v1/agents/{name}/health

Check agent health status

GET

/v1/agents/{name}/stats

Get agent usage statistics

GET

/v1/agents

List all available agents

POST

/v1/agents/deploy

Deploy a new agent (CLI uses this)

POST

/v1/agents/auto-onboard

Auto-onboard: agent self-registers with card

PUT

/v1/agents/{name}/update-card

Update an onboarded agent's card

GET

/v1/agents/{name}/connection-info

Re-fetch endpoints after restart

GET

/v1/discovery

Discovery: protocol, models, capabilities

GET

/v1/discovery/models

List available LLM models and costs

GET

/v1/discovery/capabilities

List infrastructure capabilities

DELETE

/v1/agents/{name}

Remove a deployed agent

GET

/v1/billing/balance

Get current token balance

POST

/v1/billing/topup

Add tokens to balance

GET

/v1/billing/usage

Get usage history and analytics

POST

/v1/auth/login

Authenticate and get API key

POST

/v1/auth/refresh

Refresh API key

GET

/v1/marketplace

List marketplace agents

POST

/v1/marketplace/list

List your agent on marketplace

Ready to build?

Install the SDK, define your agent, and deploy in under 5 minutes.

Deploy Your Agent Browse Marketplace