Back to all work
— Project 02
Agent infrastructure

AgentOS

Enterprise Multi-Agent Platform

AgentOS is a control plane for running AI agents inside a company: roles, teams, approvals, channels, memory, and runtime isolation managed from one dashboard. Operators can create and govern the fleet without touching the runtime underneath. I designed and built the product, backend, operator console, and AWS infrastructure solo.

Role
Solo Founder + Lead Engineer
Period
2025 to present
Status
Production
FastAPINext.jsAWS CDKOpenClawQdrantFargateTeams BotSecrets Manager
— Chapter 01
System shape

How the system fits together.

Click a block to zoom in
Agents run under a control plane with roles, approvals, memory, and isolated runtimes. Click any block to see how a piece works.
Fig. 01 — AgentOS architecture
— Chapter 02
Decisions and outcomes

The calls that shaped it.

  1. 01

    The core decision was to build a control plane, not a wrapper. Operators create agents, hand them skills, watch what they do, and shape their environment from the dashboard — the runtime stays an implementation detail. That one boundary shaped everything else.

  2. 02

    Nothing risky happens unsupervised. Every action that reaches a real outside system passes a verification agent (GateKeeper) and, when it matters, a human — and it fails safe: if the gate is down, the action is blocked, not waved through.

  3. 03

    You can describe a team in plain English and the platform designs it for you — agents, skills, schedules, channels — as a blueprint you review and approve before anything is created. Standing up a new set of agents feels like a conversation, not a config project.

  4. 04

    A marketplace of skills and plugins with trust built in: agents can pull the safe, everyday capabilities themselves, sensitive ones need an admin’s sign-off, and every item tracks who made it.

  5. 05

    It’s a real operations product, not a demo: live fleet monitoring, cost tracking that separates infrastructure from model spend, full audit history, Microsoft Teams / Slack / Telegram reachability, and a monitoring agent that watches the other agents — all deployed as infrastructure-as-code.

— Aside
The interesting work isn't the stack. It's the boundaries.
— Chapter 03
How it runs

What it runs on.

  • 01
    Python / FastAPI control plane — one service, separate doors for the dashboard and the agents
  • 02
    Next.js operator console for creating, watching, and governing the fleet
  • 03
    Agents run in isolated AWS Fargate containers on the OpenClaw runtime, each with a gateway that keeps it in sync
  • 04
    A verification agent (GateKeeper) in front of every external write
  • 05
    AI blueprints that design whole fleets or single agents, with human approval before anything is built
  • 06
    A marketplace for skills and plugins with per-item trust tiers
  • 07
    Microsoft Teams bots provisioned automatically through Azure, plus Slack and Telegram
  • 08
    Qdrant vector memory scoped by agent, team, and org
  • 09
    AWS, infrastructure-as-code across four CDK stacks: network, data, platform, agents