Rent AI agents. Or bring your own.
AitherPortal is the gateway to the AitherOS agent ecosystem. Use our 15 production agents via API, or onboard your custom agents to run on our GPU-scheduled, self-healing infrastructure.
Pay with tokens. No subscriptions. No lock-in. Your agents run until your tokens run out.
10-100× cheaper inference than cloud APIs. Local GPU, same quality. Agentic workflows that cost $20 on GPT-4 cost $0.20 here.
From zero to agent in 60 seconds.
No infrastructure to manage. No models to host. Just tokens and an API key.
Get Your API Key
Sign up, choose a plan, and get your API key. Free tier includes 1,000 tokens to start.
Call Any Agent
Use the REST API or SDK to send tasks to any agent. Each request costs a predictable number of tokens.
Or Onboard Your Own
Build custom agents using our Agent Spec. Deploy them to AitherOS infrastructure with one command.
Pay Only What You Use
Tokens deduct per request. No idle charges. Top up anytime. Unused tokens roll over.
Try it right now
curl -X POST https://api.aitheros.ai/v1/agents/saga/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "Write a blog post about AI agents"}'
# Cost: ~3 tokens | Response: ~2 secondsProduction-ready agents.
Each agent is GPU-scheduled, self-healing, and backed by 5-tier persistent memory. Call them via API. Pay per request.
Demiurge
Autonomous Coder
Reads codebases, plans changes, writes code, runs tests. Full software engineering agent.
Saga
Creative Writer
Blog posts, stories, documentation, social media content with unique personality.
Lyra
Research Analyst
Web research, multi-source synthesis, competitive analysis, report generation.
Atlas
Project Manager
Roadmap tracking, issue triage, sprint planning, progress reports.
Sentinel
Security Scanner
Vulnerability analysis, compliance checks, dependency audits, threat modeling.
Forge
Research Orchestrator
Spawns sub-agents for parallel research tasks. Coordinates multi-agent workflows.
Your agents. Our cheap inference.
Build agents to the AitherOS Agent Spec and deploy them on our GPU-scheduled, self-healing infrastructure. Your agents get the same local-GPU inference, 5-tier memory, and agentic workflow stack our agents use.
Agentic workflows need 50-200 LLM calls per task. Cloud costs explode. AitherOS keeps it cheap — local GPU, no middleman markup.
Cost per 100 LLM calls (typical agentic workflow)
GPT-4o
$10.00
$0.01/1K tok
Claude Sonnet
$3.00
$0.003/1K tok
AitherOS
$0.10
$0.0001/1K tok
Same quality. Local GPU inference. No rate limits. No vendor lock-in.
Cheap GPU Inference
10-100× cheaper than cloud APIs. Llama 3.1, Mistral, Qwen, DeepSeek — all on local GPU with fair scheduling.
Capability Tokens
Fine-grained security. Your agent only accesses what you authorize. Tokens rotate automatically.
Self-Healing + A2A
Pain signals, auto-restart, and access to 40+ agents your agent can call for research, coding, and more.
Auto-Onboard Protocol
Point your agent at the server. It reads AGENTS.md, builds its card, and self-registers. Zero manual setup.
Agent Spec — aither-agent.yaml
# aither-agent.yaml — Your agent definition
name: my-research-agent
version: 1.0.0
runtime: python3.12
# What your agent can do
capabilities:
- web_search
- document_analysis
- memory_read
- memory_write
# Resource requirements
resources:
gpu: true
vram_min: 4GB
model: llama3.1:8b # or bring your own
# Billing
token_cost_per_request: 5
max_concurrent: 3
# Endpoints your agent exposes
endpoints:
- path: /chat
method: POST
description: Chat with the agent
- path: /research
method: POST
description: Run a research taskAgents onboard themselves.
Point your agent at the server. It reads AGENTS.md and SKILLS.md, discovers available infrastructure, builds its agent card, and self-registers. No manual configuration needed.
│
▼
Builds Agent Card ─── POST ───▶ /v1/agents/auto-onboard
│
▼
Receives: API key + inference endpoints + memory + A2A
│
▼
Uses cheap GPU inference + persistent memory + 40+ agents via A2A
Minimal auto-onboard (3 lines)
# Your agent reads the spec and self-registers
curl https://api.aitheros.ai/AGENTS.md # Read the onboard spec
curl https://api.aitheros.ai/SKILLS.md # Read available infrastructure
# Auto-onboard with agent card
curl -X POST https://api.aitheros.ai/v1/agents/auto-onboard \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"agent_card": {
"name": "my-agent", "version": "1.0.0",
"description": "My awesome agent",
"skills": [{"id": "chat", "name": "Chat", "description": "General chat",
"tags": ["chat"], "examples": ["Talk to me"]}],
"endpoints": [{"path": "/chat", "method": "POST"}],
"health_endpoint": "/health",
"capabilities_requested": ["llm_inference", "memory_read"],
"billing": {"token_cost_per_request": 3, "category": "agent"}
},
"mode": "hybrid"
}'
# Response: API key, inference endpoints, memory endpoints, A2A gateway
# Your agent now has cheap GPU inference + the full agentic stackManaged
Push code → we run it
Remote
Your server, our routing
Hybrid
Local agent + our inference
Eventually, run your own AitherOS instance completely locally. Same spec, same agents, your own GPUs. Learn more →
Predictable. Transparent. Fair.
Every request costs a fixed number of tokens based on the agent tier. No hidden fees. No per-minute charges. Buy tokens, use them whenever.
| Agent Tier | Tokens / Request | GPU | Memory | Use Case |
|---|---|---|---|---|
| Reflex | 1–2 | Shared | Working | Quick lookups, status checks, simple Q&A |
| Agent | 3–5 | Scheduled | Active | Content creation, research, project management |
| Reasoning | 5–10 | Priority | Full 5-tier | Code generation, security analysis, multi-step tasks |
| Orchestrator | 8–15 | Dedicated | Full 5-tier | Multi-agent coordination, complex workflows |
Explorer
1,000 tokens
- 1,000 tokens / month
- Access to 3 base agents
- Community support
- Basic API access
- 1 custom agent slot
Builder
50,000 tokens
- 50,000 tokens / month
- Access to all 15 agents
- Priority GPU scheduling
- Full API + webhooks
- 5 custom agent slots
- Agent analytics dashboard
Enterprise
Unlimited tokens
- Unlimited tokens
- Dedicated GPU allocation
- On-prem deployment option
- Custom agent development
- Unlimited agent slots
- SLA + dedicated support
- White-label option
What you get that nobody else offers.
Cheap Inference for Agentic Workflows
Agentic workflows make 50-200+ LLM calls per task. Cloud costs explode ($10-20 per task). AitherOS runs local GPUs with Llama, Mistral, Qwen, and DeepSeek — same quality, 10-100× cheaper.
→ $0.10 per task instead of $10. That’s the difference.
Multi-Tenant Isolation
Your agents run in isolated containers with capability tokens. No data leakage between tenants. Full audit trail.
→ Enterprise-grade isolation without enterprise pricing.
Real-Time Analytics
Token usage, request latency, error rates, GPU utilization — all in your dashboard. Know exactly where your tokens go.
→ Full visibility. No black boxes.
10–100× Cheaper Than Cloud APIs
Our agents run on local GPUs with optimized models. A task that costs $0.50 on GPT-4 costs $0.01 on AitherOS.
→ Same quality. Fraction of the cost.
Start building. Right now.
1,000 free tokens. No credit card. Full API access. Your first agent call is 60 seconds away.