Build your agent team in pi with one command.

You don't need to hire a dev team. You need to define one. Tamandua gives you a team of specialized AI agents — planner, developer, verifier, tester, reviewer — that work together in reliable, repeatable workflows. One install. Zero infrastructure.

curl -fsSL https://raw.githubusercontent.com/igorhvr/tamandua/main/scripts/install.sh | bash

Or just tell your pi agent: “Clone github.com/igorhvr/tamandua to my home dir, install it and learn the skill included inside it.”

Get Started
Tamandua dashboard with live workflow runs, step progress, and token usage

Features

Deterministic Workflows

Same workflow, same steps, same order. Not "hopefully the agent remembers to test."

Agents Verify Each Other

The developer doesn't mark their own homework. A separate verifier checks every story against acceptance criteria.

Fresh Context, Every Step

Each agent gets a clean session. No context window bloat. No hallucinated state from 50 messages ago.

Retry and Escalate

Failed steps retry automatically. If retries exhaust, it escalates to you. Nothing fails silently.

Bundled Workflows

Tamandua ships with 23 bundled workflows organized into six families. Use tamandua workflow list to see available workflows, and tamandua workflow install <id> to install one.

Worktree variants (*-worktree, *-merge-worktree) run in a detached git worktree created from your origin repository. Your main working copy stays untouched until the workflow completes. This gives you full isolation — continue working while agents iterate — and a clean abort path: delete the worktree and nothing in your origin repo has changed.

Rugpull Handling

When a merge workflow (-merge, -merge-worktree) fails at the finalize_merge step and the base branch tip has moved since the run started, Tamandua automatically launches a fresh replacement run with the same parameters. This "rugpull" detection runs after the final merge failure — if the base branch stayed put, no replacement is triggered. Pass --no-relaunch-upon-rugpull to workflow run to suppress the automatic replacement.

Feature Development

Story-based feature development. The planner decomposes your task into ordered user stories. Each story goes through implement → verify → test before the next one starts.

Feature Development variants
Variant Workflow ID Agents Pipeline
Local-only feature-dev 5 plan → setup → implement → verify → test
+ Merge feature-dev-merge 6 plan → setup → implement → verify → test → finalize_merge
Worktree feature-dev-worktree 5 plan → setup → implement → verify → test
Worktree + Merge feature-dev-merge-worktree 6 plan → setup → implement → verify → test → finalize_merge
GitHub PR feature-dev-github-pr 6 plan → setup → implement → verify → test → pr → review

Bug Fix

Bug triage and fix. The triager reproduces the bug, the investigator finds the root cause, the fixer patches it, and the verifier confirms the fix against acceptance criteria.

Bug Fix variants
Variant Workflow ID Agents Pipeline
Local-only bug-fix 5 triage → investigate → setup → fix → verify
+ Merge bug-fix-merge 6 triage → investigate → setup → fix → verify → finalize_merge
Worktree bug-fix-worktree 5 triage → investigate → setup → fix → verify
Worktree + Merge bug-fix-merge-worktree 6 triage → investigate → setup → fix → verify → finalize_merge
GitHub PR bug-fix-github-pr 6 triage → investigate → setup → fix → verify → pr

Security Audit

Vulnerability scanning and patching. Scans for vulnerabilities, ranks by severity, patches each one, re-audits after all fixes are applied, and runs regression tests.

Security Audit variants
Variant Workflow ID Agents Pipeline
Local-only security-audit 6 scan → prioritize → setup → fix → verify → test
+ Merge security-audit-merge 7 scan → prioritize → setup → fix → verify → test → finalize_merge
Worktree security-audit-worktree 6 scan → prioritize → setup → fix → verify → test
Worktree + Merge security-audit-merge-worktree 7 scan → prioritize → setup → fix → verify → test → finalize_merge
GitHub PR security-audit-github-pr 7 scan → prioritize → setup → fix → verify → test → pr

Quarantine Broken Tests

Detect failing tests, disable them minimally, and iterate until the full test suite passes. Useful for establishing a clean baseline on a branch with known test failures.

Quarantine Broken Tests variants
Variant Workflow ID Agents Pipeline
Local-only quarantine-broken-tests 3 setup → quarantine → verify
+ Merge quarantine-broken-tests-merge 4 setup → quarantine → verify → finalize_merge
Worktree + Merge quarantine-broken-tests-merge-worktree 4 setup → quarantine → verify → finalize_merge

Quick Tasks

Single-agent workflows for quick one-off tasks and workflow auto-selection.

Quick Tasks variants
Workflow ID Agents Pipeline Description
do-now 1 execute Submit any task. Get back a success/failure report. No planning, no stories.
just-do-it 1 dispatch Describe what you want. Dispatches to the most appropriate workflow automatically. For coding tasks (feature-dev*, bug-fix*, security-audit*) it defaults to merge-worktree variants unless the prompt gives a specific reason otherwise.
do-review-do-verify 3 do → review → do-again → verify Two-pass execution: do the work, review it, revise, then verify the result.

Maintenance & Audits

Workflows for auditing and validating the project itself.

Maintenance & Audits variants
Workflow ID Agents Pipeline Description
frontend-test 1 test Builds the project and validates the dashboard frontend: HTML structure, route definitions, and test coverage. Does not start a second dashboard daemon.
skills-normalize-audit 3 scan → audit → report Scans a skills directory, analyzes the skills for overlaps and redundancies, and produces consolidation recommendations in a structured report.

Install all bundled workflows at once with:

$ tamandua workflow install --all

Installation

Install from GitHub

curl -fsSL https://raw.githubusercontent.com/igorhvr/tamandua/main/scripts/install.sh | bash

Or just tell your agent: "Clone github.com/igorhvr/tamandua to my home dir, install it and learn the skill included inside it."

Install from Local Checkout

git clone https://github.com/igorhvr/tamandua.git
cd tamandua
./build-and-install

Or step by step:

./build        # npm install + tsc
./install      # symlink into ~/.local/bin

The build script handles everything: checks Node.js >= 22, runs npm install, compiles TypeScript. The install script creates a symlink at ~/.local/bin/tamandua pointed at your checkout — so you can keep the source wherever you like and tamandua stays in sync.

Prerequisites

  • Node.js >= 22
  • pi installed on the host — Tamandua uses pi for AI agent execution
  • gh CLI for PR creation steps

Not on npm. Tamandua is installed from source (or GitHub), not the npm registry.

Why It Works

  1. Deterministic workflows

    Same workflow, same steps, same order. Not "hopefully the agent remembers to test."

  2. Agents verify each other

    The developer doesn't mark their own homework. A separate verifier checks every story against acceptance criteria.

  3. Fresh context, every step

    Each agent gets a clean session. No context window bloat. No hallucinated state from 50 messages ago.

  4. Retry and escalate

    Failed steps retry automatically. If retries exhaust, it escalates to you. Nothing fails silently.

How It Works

1 Define

Agents and steps in YAML. Each agent gets a persona, workspace, and strict acceptance criteria. No ambiguity about who does what.

2 Install

One command provisions everything: agent workspaces, polling, subagent permissions. No Docker, no queues, no external services.

3 Run

Agents poll for work independently. Claim a step, do the work, pass context to the next agent. SQLite tracks state. The scheduler keeps it moving.

Minimal by Design

YAML + SQLite + polling. That's it. No Redis, no Kafka, no container orchestrator. Tamandua is a TypeScript CLI with zero external dependencies. It runs wherever pi runs.

Quick Example

$ tamandua workflow install feature-dev

# Or install all bundled workflows at once
$ tamandua workflow install --all
✓ Installed workflow: feature-dev

$ tamandua workflow run feature-dev "Add user authentication with OAuth"
Run: a1fdf573
Workflow: feature-dev
Status: running

$ tamandua workflow status "OAuth"
Run: a1fdf573
Workflow: feature-dev
Steps:
  [done   ] plan (planner)
  [done   ] setup (setup)
  [running] implement (developer)  Stories: 3/7 done
  [pending] verify (verifier)
  [pending] test (tester)

Dashboard & Kanban

When you start the management dashboard (tamandua dashboard), Tamandua automatically starts the remote MCP server too.

Use tamandua dashboard status to verify both endpoints are up.

Tamandua dashboard showing workflow runs, step progress, and token usage statistics
The Tamandua dashboard at http://localhost:3334 — real-time view of workflow runs, step status, and agent activity.

Each run also has a swim-lane Kanban view at http://localhost:3334/runs/<run-id>/kanban. Lanes are derived dynamically from the workflow's steps. Single steps render one card per lane; loop steps render one card per story. Cards are colour-coded by status (todo / running / done / failed).

Tamandua kanban board showing swim-lane workflow step cards colour-coded by status
Kanban swim-lane view — one card per step or story, colour-coded by status: todo, running, done, failed.

Native AutoResearch

Tamandua includes native AutoResearch primitives for measurable optimization loops. Unlike a normal workflow, AutoResearch stores durable project-local state so an agent can resume after restarts, learn from each measured run, and choose the next experiment from evidence.

Use AutoResearch when the task has a reliable numeric metric and the agent should run a sequence of experiments instead of one batch of edits — raising test coverage, reducing validation loss, improving latency, or lowering cost while preserving correctness.

tamandua autoresearch init \
  --goal "reduce validation loss" \
  --metric val_bpb \
  --direction lower \
  --command "uv run train.py"

tamandua autoresearch run-experiment
tamandua autoresearch log-experiment --status auto \
  --description "try lower learning rate" \
  --hypothesis "smaller LR improves stability" \
  --learned "validation improved but training slowed" \
  --next-focus "test warmup schedule"
tamandua autoresearch next

The core loop is init → run-experiment → log-experiment → next. log --status auto classifies a run as baseline, keep, discard, crash, or checks_failed by comparing the latest metric with prior accepted results. The next prompt carries the ratchet: it restates the goal, best result, last learning, and next focus before the agent starts another experiment.

Project Files

AutoResearch project files
File Purpose
autoresearch.config.json Session config: goal, metric, direction, command, parser, checks.
autoresearch.md Agent-facing objective and operating loop.
autoresearch.jsonl Append-only run history: measured results, decisions, learning, next focus.
autoresearch.sh Benchmark command.
autoresearch.checks.sh Optional correctness checks run after successful measurements.

The dashboard's AutoResearch panel reads the run's harness working directory, discovers the nearest autoresearch.config.json / autoresearch.jsonl, and renders the experiment trace — gray points are attempted experiments; green points and the green line are the kept best-so-far frontier. A SQLite session registry makes every AutoResearch project discoverable from the dashboard, and tamandua autoresearch prune cleans up stale registry entries without touching project files.

Commands & Harness Selection

Everyday Commands

Everyday CLI commands
Command Description
tamandua get-ready Install bundled workflows and start dashboard/control plane.
tamandua workflow run <id> <task> Start a run (defaults harness CWD to your current directory).
tamandua workflow status <query> Check run status by run id, prefix, or task substring.
tamandua workflow runs List all runs.
tamandua workflow resume <run-id> Resume a failed run.
tamandua dashboard Start the web dashboard (also starts remote MCP on port 3338).
tamandua logs-tail Follow recent activity as new events arrive.
tamandua nudge Wake all scheduled agents for running runs to poll immediately.
tamandua update Pull the source checkout, rebuild, reinstall workflows, restart services.

Harness Selection

By default, Tamandua uses pi (pi --print) as its agent harness. Override it per run with mutually exclusive flags on tamandua workflow run:

Harness selection flags
Flag Description
--pi-as-harness Use pi as the agent harness. This is the default.
--hermes-as-harness Use Hermes instead of pi. Alpha quality: very slow, and token accounting is broken. Use pi for production workflows.

To use a custom Hermes binary, set TAMANDUA_HERMES_BINARY; otherwise Tamandua searches for hermes on your PATH. Harness validation runs at scheduling time — a missing or non-executable binary fails the run immediately with a clear error.

Build Your Own

The bundled workflows are starting points. Define your own agents, steps, retry logic, and verification gates in plain YAML and Markdown. If you can write a prompt, you can build a workflow.

id: my-workflow
name: My Custom Workflow
agents:
  - id: researcher
    name: Researcher
    workspace:
      files:
        AGENTS.md: agents/researcher/AGENTS.md

steps:
  - id: research
    agent: researcher
    input: |
      Research {{task}} and report findings.
      Reply with STATUS: done and FINDINGS: ...
    expects: "STATUS: done"

Full guide.

Skill included. The tamandua-agents skill is bundled and is excellent at allowing your agents to build-and-forget with tamandua. The CLI itself is also designed in a way that is easy for agents in general to grasp. Remote MCP tools are exposed so agents can query runs, start workflows, and check status autonomously.

Remote MCP Tools

The remote MCP endpoint exposes 14 tools at http://localhost:3338/mcp:

Run Management

Run Management MCP tools
Tool Description
tamandua.runs.list List recent Tamandua workflow runs. Accepts optional limit (integer, 1–200, default 50).
tamandua.run.status Fetch detailed status for a run. Requires query (run id, prefix, or task substring).
tamandua.run.start Start a workflow run. Requires workflowId and taskTitle.
tamandua.run.pause Pause a running workflow run. Requires runId. Optional drain (boolean) to wait for in-flight work before pausing.
tamandua.run.resume Resume a paused workflow run. Requires runId.
tamandua.run.delete Permanently delete a workflow run and associated steps, stories, and worktree metadata. Requires runId. Optional force (boolean) cancels and deletes running or paused runs.

tamandua.run.start parameters

tamandua.run.start parameters
Parameter Required Description
workflowId Yes Workflow id to run.
taskTitle Yes Task description for the workflow run.
workingDirectoryForHarness For direct workflows Harness working directory for remote MCP runs. Required for direct workflows, invalid for worktree workflows.
worktreeOriginRepository For worktree workflows Repository path to create the worktree from. Required for worktree workflows, invalid for direct workflows.
worktreeOriginRef No Git ref (branch, tag, SHA) for the worktree. Only valid for worktree workflows.
noHurrySaveTokensMode No When true, reduces polling frequency to save tokens (15-min floor and default instead of 1-min floor, 5-min default). Defaults to false.

workingDirectoryForHarness and worktreeOriginRepository are mutually exclusive: direct workflows require the former, worktree workflows require the latter.

Events & Metadata

Events and Metadata MCP tools
Tool Description
tamandua.events.recent List recent global Tamandua events. Accepts optional limit (integer, 1–500, default 50).
tamandua.source.path Return the local Tamandua source checkout path. No parameters.
tamandua.skill.path Return the path to the bundled tamandua-agents agent skill. No parameters.
tamandua.update.command Return local CLI guidance for updating Tamandua safely. No parameters.

AutoResearch

AutoResearch MCP tools
Tool Description
tamandua.autoresearch.init Create project-local AutoResearch state. Requires cwd, goal, metricName, direction, and command. Optional metricUnit, metricRegex, checksCommand, and overwrite.
tamandua.autoresearch.run_experiment Run the configured experiment command in cwd, parse the metric, run optional checks, and append a run_result. Optional command, metricRegex, checksCommand, and timeoutMs.
tamandua.autoresearch.log_experiment Append the decision and learning for the latest run. Requires cwd and description; optional status, metric, hypothesis, learned, nextFocus, commit, and revertDiscard.
tamandua.autoresearch.status Summarize baseline, best result, failures, and the next ratchet prompt for cwd.

Security

You're installing agent teams that run code on your machine. We take that seriously.