Skip to content

AI agents write code, run commands, and delete files - autonomously. Polos gives them isolated sandboxes with built-in tools (shell, file system, web search), approval flows that reach you on various channels, and durable execution with automatic retries, prompt caching, and concurrency control. Agents get full power. Your systems stay safe.

License

Notifications You must be signed in to change notification settings

polos-dev/polos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

101 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Polos Logo

The runtime for agents that do real work

GitHub Stars Documentation Discord

Sandboxed execution. Agents that reach you. Durable workflows.

⭐ Star us to support the project!


AI agents break the rules of traditional software. They're async by nature - running while you sleep, but still needing you to approve, confirm, or provide a credential. They're autonomous by design - you say "fix the bug" and they write code, run commands, delete files. That power is the point. And the risk.

Most frameworks ignore this. Polos is built for it.

100% open source. Write it all in plain Python or TypeScript. No DAGs to define, no graph syntax to learn.

import { defineAgent, sandboxTools } from "@polos/sdk";
import { anthropic } from "@ai-sdk/anthropic";

// Create a sandboxed environment β€” agents get exec, read, write,
// edit, glob, and grep tools automatically.
const sandbox = sandboxTools({
  env: 'docker',
  docker: {
    image: 'node:20-slim',
    workspaceDir: './workspace',
    memory: '2g',
  },
});

// Give the agent sandbox tools β€” it can now run commands,
// read/write files, and explore the codebase autonomously.
const codingAgent = defineAgent({
  id: 'coding_agent',
  model: anthropic('claude-opus-4-5'),
  systemPrompt: 'You are a coding assistant. The repo is at /workspace.',
  tools: [...sandbox], // exec, read, write, edit, glob, grep
});

What You Get With Polos

Secure Sandbox

Agents run in isolated environments - Docker, E2B, or cloud VMs. Built-in tools for shell, file system, and web search. Full power. Zero risk to your systems.

Agents That Reach You

Agents reach you - not the other way around. Stripe-like approval pages that collect input, not just yes/no. Slack, SMS, email - you're at dinner, phone buzzes, one tap, done.

Durable Execution

State persists - agents resume exactly where they left off. Automatic retries on failure. 60-80% cost savings via prompt caching. Built-in observability for every step, every approval, every tool call. Concurrency control across multiple agents - no API rate limit chaos.


See It In Action

Watch a coding agent built with Polos - sandboxed execution, tool calls, and real-time observability.

Watch the demo video


Quick Start

1. Install Polos Server

curl -fsSL https://install.polos.dev/install.sh | bash
polos-server start

Copy the project ID displayed when you start the server. You'll need it in the next steps.

2. Install the SDK

Python

pip install polos-sdk

TypeScript

npm install @polos/sdk

3. Create a coding agent

Python

# agents.py
from polos import Agent, sandbox_tools, SandboxToolsConfig, LocalEnvironmentConfig

sandbox_tools = sandbox_tools(SandboxToolsConfig(
    env="local",
    local=LocalEnvironmentConfig(cwd="./workspace", path_restriction="./workspace"),
))

coding_agent = Agent(
    id="coding_agent",
    provider="anthropic",
    model="claude-sonnet-4-5",
    system_prompt="You are a coding agent. Your workspace is at ./workspace.",
    tools=sandbox_tools,
)
# worker.py
from polos import PolosClient, Worker
from agents import coding_agent, sandbox_tools

client = PolosClient(project_id="your-project-id")
worker = Worker(client=client, agents=[coding_agent], tools=list(sandbox_tools))

if __name__ == "__main__":
    import asyncio
    asyncio.run(worker.run())

TypeScript

// agents.ts
import { defineAgent, sandboxTools } from "@polos/sdk";
import { anthropic } from "@ai-sdk/anthropic";

export const sandboxTools = sandboxTools({
  env: "local",
  local: { cwd: "./workspace", pathRestriction: "./workspace" },
});

export const codingAgent = defineAgent({
  id: "coding_agent",
  model: anthropic("claude-sonnet-4-5"),
  systemPrompt: "You are a coding agent. Your workspace is at ./workspace.",
  tools: [...sandboxTools],
});
// worker.ts
import { PolosClient, Worker } from "@polos/sdk";
import { codingAgent, sandboxTools } from "./agents.js";

const client = new PolosClient({ projectId: "your-project-id" });
const worker = new Worker({ client, agents: [codingAgent], tools: [...sandboxTools] });

await worker.run();

4. Invoke the agent

Local sandbox tools suspend for approval before running commands or writing files. The client streams workflow events, prompts you in the terminal, and resumes the agent.

Python

# main.py
import asyncio
from polos import PolosClient
from polos.features import events
from agents import coding_agent

async def main():
    client = PolosClient(project_id="your-project-id")
    handle = await client.invoke(coding_agent.id, {
        "input": "Create hello.js that prints 'Hello, world!' and run it.",
        "streaming": True,
    })

    # Stream events β€” approve each exec/write/edit when the agent suspends
    async for event in events.stream_workflow(client, handle.root_workflow_id, handle.id):
        if event.event_type and event.event_type.startswith("suspend_"):
            step_key = event.event_type[len("suspend_"):]
            form = event.data.get("_form", {})
            context = form.get("context", {})
            print(f"\n  Agent wants to: {context.get('command') or context.get('tool', step_key)}")
            approved = input("  Approve? (y/n): ").strip().lower() == "y"
            await client.resume(handle.root_workflow_id, handle.id, step_key, {"approved": approved})

    execution = await client.get_execution(handle.id)
    print(f"\nResult: {execution.get('result')}")

asyncio.run(main())

TypeScript

// main.ts
import { PolosClient } from "@polos/sdk";
import { codingAgent } from "./agents.js";
import * as readline from "node:readline/promises";

const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
const client = new PolosClient({ projectId: "your-project-id" });
const handle = await client.invoke(codingAgent.id, {
  input: "Create hello.js that prints 'Hello, world!' and run it.",
  streaming: true,
});

// Stream events β€” approve each exec/write/edit when the agent suspends
for await (const event of client.events.streamWorkflow(handle.rootWorkflowId, handle.id)) {
  if (event.eventType?.startsWith("suspend_")) {
    const stepKey = event.eventType.slice("suspend_".length);
    const context = (event.data as any)?._form?.context ?? {};
    console.log(`\n  Agent wants to: ${context.command ?? context.tool ?? stepKey}`);
    const answer = await rl.question("  Approve? (y/n): ");
    const approved = answer.trim().toLowerCase() === "y";
    await client.resume(handle.rootWorkflowId, handle.id, stepKey, { approved });
  }
}

const execution = await client.getExecution(handle.id);
console.log(`\nResult: ${typeof execution.result === "string" ? execution.result : JSON.stringify(execution.result)}`);
rl.close();

5. Run it

# Terminal 1: Start the worker
python worker.py    # or: npx tsx worker.ts

# Terminal 2: Invoke the agent
python main.py      # or: npx tsx main.ts

See the full example for Python or TypeScript with richer approval UIs and more.

6. See it in action

Open the Polos UI to see your agent's execution trace, tool calls, and reasoning:

Polos Observability UI

πŸ“– Full Quick Start Guide β†’


Architecture

Polos Architecture

Polos consists of three components:

  • Orchestrator: Written in Rust. Manages execution state, handles retries, and coordinates workers
  • Worker: Runs your agents and workflows, connects to the orchestrator
  • SDK: Python and TypeScript libraries for defining agents, workflows, and tools

Why Polos?

Feature Description
πŸ”’ Sandboxed Execution Agents run in isolated Docker containers, E2B, or cloud VMs. Built-in tools for shell, files, and web search - full autonomy with zero risk.
πŸ“² Agents That Reach You Approval pages, Slack, SMS, email. Agents notify you when they need input. One tap from your phone. Done.
🧠 Durable State Your agent survives crashes with call stack and local variables intact. Step 18 of 20 fails? Resume from step 18. No wasted LLM calls.
🚦 Global Concurrency System-wide rate limiting with queues and concurrency keys. Prevent one rogue agent from exhausting your entire OpenAI quota.
🀝 Human-in-the-Loop Native support for pausing execution. Wait hours or days for user approval and resume with full context. Paused agents consume zero compute.
πŸ“‘ Agent Handoffs Transactional memory for multi-agent systems. Pass reasoning history between specialized agents without context drift.
πŸ” Decision-Level Observability Trace the reasoning behind every tool call, not just raw logs. See why your agent chose Tool B over Tool A.
⚑ Production Ready Automatic retries, exactly-once execution guarantees, OpenTelemetry tracing built-in.

Logic Belongs in Code, Not Configs

With Polos:

Python

@workflow
async def process_order(ctx: WorkflowContext, order: ProcessOrderInput):
    if order.amount > 1000:
        approved = await ctx.step.suspend("approval", data=order.model_dump())
        if not approved.data["ok"]:
            return {"status": "rejected"}

    await ctx.step.run("charge", charge_stripe, order)
    await ctx.step.run("notify", send_email, order)

TypeScript

const processOrder = defineWorkflow({ id: "process-order" }, async (ctx, order) => {
  if (order.amount > 1000) {
    const approved = await ctx.step.suspend("approval", { data: order });
    if (!approved.data.ok) {
      return { status: "rejected" };
    }
  }

  await ctx.step.run("charge", () => chargeStripe(order));
  await ctx.step.run("notify", () => sendEmail(order));
});

Other platforms:

dag = DAG(
    nodes=[
        Node("check_amount", CheckAmount),
        Node("approval", HumanApproval),
        Node("charge", ChargeStripe),
        Node("notify", SendEmail),
    ],
    edges=[
        ("check_amount", "approval", condition="amount > 1000"),
        ("check_amount", "charge", condition="amount <= 1000"),
        ("approval", "charge", condition="approved"),
        ("charge", "notify"),
    ]
)

No DAGs. No graph syntax. Just Python or TypeScript.


Examples

Agents

Example Python TypeScript Description
Agent with tools Python TypeScript Simple agent with tool calling
Structured Output Python TypeScript Agent with structured model responses
Streaming Python TypeScript Real-time streaming responses
Conversational Chat Python TypeScript Multi-turn conversations with memory
Thinking Agent Python TypeScript Chain-of-thought reasoning
Guardrails Python TypeScript Input/output validation
Multi-Agent Coordination Python TypeScript Workflow orchestrating multiple agents
Order Processing Python TypeScript Human-in-the-loop fraud review
Sandbox Tools Python TypeScript Code execution in an isolated Docker container
Exec Security Python TypeScript Allowlist-based command approval
Web Search Agent Python TypeScript Research agent with Tavily web search
Local Sandbox Python TypeScript Sandbox tools running on the host machine

Workflows

Example Python TypeScript Description
Workflow Basics Python TypeScript Core workflow patterns
Suspend/Resume Python TypeScript Human-in-the-loop approvals
State Persistence Python TypeScript Durable state across executions
Error Handling Python TypeScript Retry, fallback, compensation patterns
Queues & Concurrency Python TypeScript Rate limiting and concurrency control
Parallel Execution Python TypeScript Fan-out/fan-in patterns

Events & Scheduling

Example Python TypeScript Description
Event-Triggered Python TypeScript Pub/sub event-driven workflows
Scheduled Workflows Python TypeScript Cron-based scheduling

Human-in-the-Loop

Example Python TypeScript Description
Approval Page Python TypeScript Web UI for workflow approval with suspend/resume

Under the Hood

Polos captures the result of every side effect - tool calls, API responses, time delays as a durable log. If your process dies, Polos replays the workflow from the log, returning previously-recorded results instead of re-executing them. Your agent's exact local variables and call stack are restored in milliseconds.

Completed steps are never re-executed - so you never pay for an LLM call twice.


Documentation

For detailed documentation, visit polos.dev/docs


Community

Join our community to get help, share ideas, and stay updated:


Contributing

We welcome contributions! Whether it's bug reports, feature requests, documentation improvements, or code contributions.


License

Polos is Apache 2.0 licensed.

About

AI agents write code, run commands, and delete files - autonomously. Polos gives them isolated sandboxes with built-in tools (shell, file system, web search), approval flows that reach you on various channels, and durable execution with automatic retries, prompt caching, and concurrency control. Agents get full power. Your systems stay safe.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published