Features

Deterministic Workflows

Same workflow, same steps, same order. Not "hopefully the agent remembers to test."

Agents Verify Each Other

The developer doesn't mark their own homework. A separate verifier checks every story against acceptance criteria.

Fresh Context, Every Step

Each agent gets a clean session. No context window bloat. No hallucinated state from 50 messages ago.

Retry and Escalate

Failed steps retry automatically. If retries exhaust, it escalates to you. Nothing fails silently.

Bundled Workflows

Tamandua ships with 23 bundled workflows organized into six families. Use tamandua workflow list to see available workflows, and tamandua workflow install <id> to install one.

Worktree variants (*-worktree, *-merge-worktree) run in a detached git worktree created from your origin repository. Your main working copy stays untouched until the workflow completes. This gives you full isolation — continue working while agents iterate — and a clean abort path: delete the worktree and nothing in your origin repo has changed.

Rugpull Handling

When a merge workflow (-merge, -merge-worktree) fails at the finalize_merge step and the base branch tip has moved since the run started, Tamandua automatically launches a fresh replacement run with the same parameters. This "rugpull" detection runs after the final merge failure — if the base branch stayed put, no replacement is triggered. Pass --no-relaunch-upon-rugpull to workflow run to suppress the automatic replacement.

Feature Development

Story-based feature development. The planner decomposes your task into ordered user stories. Each story goes through implement → verify → test before the next one starts.

Feature Development variants
Variant	Workflow ID	Agents	Pipeline
Local-only	`feature-dev`	5	plan → setup → implement → verify → test
+ Merge	`feature-dev-merge`	6	plan → setup → implement → verify → test → finalize_merge
Worktree	`feature-dev-worktree`	5	plan → setup → implement → verify → test
Worktree + Merge	`feature-dev-merge-worktree`	6	plan → setup → implement → verify → test → finalize_merge
GitHub PR	`feature-dev-github-pr`	6	plan → setup → implement → verify → test → pr → review

Bug Fix

Bug triage and fix. The triager reproduces the bug, the investigator finds the root cause, the fixer patches it, and the verifier confirms the fix against acceptance criteria.

Bug Fix variants
Variant	Workflow ID	Agents	Pipeline
Local-only	`bug-fix`	5	triage → investigate → setup → fix → verify
+ Merge	`bug-fix-merge`	6	triage → investigate → setup → fix → verify → finalize_merge
Worktree	`bug-fix-worktree`	5	triage → investigate → setup → fix → verify
Worktree + Merge	`bug-fix-merge-worktree`	6	triage → investigate → setup → fix → verify → finalize_merge
GitHub PR	`bug-fix-github-pr`	6	triage → investigate → setup → fix → verify → pr

Security Audit

Vulnerability scanning and patching. Scans for vulnerabilities, ranks by severity, patches each one, re-audits after all fixes are applied, and runs regression tests.

Security Audit variants
Variant	Workflow ID	Agents	Pipeline
Local-only	`security-audit`	6	scan → prioritize → setup → fix → verify → test
+ Merge	`security-audit-merge`	7	scan → prioritize → setup → fix → verify → test → finalize_merge
Worktree	`security-audit-worktree`	6	scan → prioritize → setup → fix → verify → test
Worktree + Merge	`security-audit-merge-worktree`	7	scan → prioritize → setup → fix → verify → test → finalize_merge
GitHub PR	`security-audit-github-pr`	7	scan → prioritize → setup → fix → verify → test → pr

Quarantine Broken Tests

Detect failing tests, disable them minimally, and iterate until the full test suite passes. Useful for establishing a clean baseline on a branch with known test failures.

Quarantine Broken Tests variants
Variant	Workflow ID	Agents	Pipeline
Local-only	`quarantine-broken-tests`	3	setup → quarantine → verify
+ Merge	`quarantine-broken-tests-merge`	4	setup → quarantine → verify → finalize_merge
Worktree + Merge	`quarantine-broken-tests-merge-worktree`	4	setup → quarantine → verify → finalize_merge

Quick Tasks

Single-agent workflows for quick one-off tasks and workflow auto-selection.

Quick Tasks variants
Workflow ID	Agents	Pipeline	Description
`do-now`	1	execute	Submit any task. Get back a success/failure report. No planning, no stories.
`just-do-it`	1	dispatch	Describe what you want. Dispatches to the most appropriate workflow automatically. For coding tasks (feature-dev, bug-fix, security-audit*) it defaults to merge-worktree variants unless the prompt gives a specific reason otherwise.
`do-review-do-verify`	3	do → review → do-again → verify	Two-pass execution: do the work, review it, revise, then verify the result.

Maintenance & Audits

Workflows for auditing and validating the project itself.

Maintenance & Audits variants
Workflow ID	Agents	Pipeline	Description
`frontend-test`	1	test	Builds the project and validates the dashboard frontend: HTML structure, route definitions, and test coverage. Does not start a second dashboard daemon.
`skills-normalize-audit`	3	scan → audit → report	Scans a skills directory, analyzes the skills for overlaps and redundancies, and produces consolidation recommendations in a structured report.

Install all bundled workflows at once with:

$ tamandua workflow install --all

Installation

Install from GitHub

curl -fsSL https://raw.githubusercontent.com/igorhvr/tamandua/main/scripts/install.sh | bash

Or just tell your agent: "Clone github.com/igorhvr/tamandua to my home dir, install it and learn the skill included inside it."

Install from Local Checkout

git clone https://github.com/igorhvr/tamandua.git
cd tamandua
./build-and-install

Or step by step:

./build        # npm install + tsc
./install      # symlink into ~/.local/bin

The build script handles everything: checks Node.js >= 22, runs npm install, compiles TypeScript. The install script creates a symlink at ~/.local/bin/tamandua pointed at your checkout — so you can keep the source wherever you like and tamandua stays in sync.

Prerequisites

Node.js >= 22
pi installed on the host — Tamandua uses pi for AI agent execution
gh CLI for PR creation steps

Not on npm. Tamandua is installed from source (or GitHub), not the npm registry.

Why It Works

Deterministic workflows
Same workflow, same steps, same order. Not "hopefully the agent remembers to test."
Agents verify each other
The developer doesn't mark their own homework. A separate verifier checks every story against acceptance criteria.
Fresh context, every step
Each agent gets a clean session. No context window bloat. No hallucinated state from 50 messages ago.
Retry and escalate
Failed steps retry automatically. If retries exhaust, it escalates to you. Nothing fails silently.

How It Works

1 Define

Agents and steps in YAML. Each agent gets a persona, workspace, and strict acceptance criteria. No ambiguity about who does what.

2 Install

One command provisions everything: agent workspaces, polling, subagent permissions. No Docker, no queues, no external services.

3 Run

Agents poll for work independently. Claim a step, do the work, pass context to the next agent. SQLite tracks state. The scheduler keeps it moving.

Minimal by Design

YAML + SQLite + polling. That's it. No Redis, no Kafka, no container orchestrator. Tamandua is a TypeScript CLI with zero external dependencies. It runs wherever pi runs.

Quick Example

$ tamandua workflow install feature-dev

# Or install all bundled workflows at once
$ tamandua workflow install --all
✓ Installed workflow: feature-dev

$ tamandua workflow run feature-dev "Add user authentication with OAuth"
Run: a1fdf573
Workflow: feature-dev
Status: running

$ tamandua workflow status "OAuth"
Run: a1fdf573
Workflow: feature-dev
Steps:
  [done   ] plan (planner)
  [done   ] setup (setup)
  [running] implement (developer)  Stories: 3/7 done
  [pending] verify (verifier)
  [pending] test (tester)

Dashboard & Kanban

When you start the management dashboard (tamandua dashboard), Tamandua automatically starts the remote MCP server too.

Dashboard: http://localhost:3334
MCP endpoint: http://localhost:3338/mcp (fixed port)

Use tamandua dashboard status to verify both endpoints are up.

Tamandua dashboard showing workflow runs, step progress, and token usage statistics — The Tamandua dashboard at `http://localhost:3334` — real-time view of workflow runs, step status, and agent activity.

Each run also has a swim-lane Kanban view at http://localhost:3334/runs/<run-id>/kanban. Lanes are derived dynamically from the workflow's steps. Single steps render one card per lane; loop steps render one card per story. Cards are colour-coded by status (todo / running / done / failed).

Tamandua kanban board showing swim-lane workflow step cards colour-coded by status — Kanban swim-lane view — one card per step or story, colour-coded by status: todo, running, done, failed.

Native AutoResearch

Tamandua includes native AutoResearch primitives for measurable optimization loops. Unlike a normal workflow, AutoResearch stores durable project-local state so an agent can resume after restarts, learn from each measured run, and choose the next experiment from evidence.

Use AutoResearch when the task has a reliable numeric metric and the agent should run a sequence of experiments instead of one batch of edits — raising test coverage, reducing validation loss, improving latency, or lowering cost while preserving correctness.

tamandua autoresearch init \
  --goal "reduce validation loss" \
  --metric val_bpb \
  --direction lower \
  --command "uv run train.py"

tamandua autoresearch run-experiment
tamandua autoresearch log-experiment --status auto \
  --description "try lower learning rate" \
  --hypothesis "smaller LR improves stability" \
  --learned "validation improved but training slowed" \
  --next-focus "test warmup schedule"
tamandua autoresearch next

The core loop is init → run-experiment → log-experiment → next. log --status auto classifies a run as baseline, keep, discard, crash, or checks_failed by comparing the latest metric with prior accepted results. The next prompt carries the ratchet: it restates the goal, best result, last learning, and next focus before the agent starts another experiment.

Project Files

AutoResearch project files
File	Purpose
`autoresearch.config.json`	Session config: goal, metric, direction, command, parser, checks.
`autoresearch.md`	Agent-facing objective and operating loop.
`autoresearch.jsonl`	Append-only run history: measured results, decisions, learning, next focus.
`autoresearch.sh`	Benchmark command.
`autoresearch.checks.sh`	Optional correctness checks run after successful measurements.

The dashboard's AutoResearch panel reads the run's harness working directory, discovers the nearest autoresearch.config.json / autoresearch.jsonl, and renders the experiment trace — gray points are attempted experiments; green points and the green line are the kept best-so-far frontier. A SQLite session registry makes every AutoResearch project discoverable from the dashboard, and tamandua autoresearch prune cleans up stale registry entries without touching project files.

Commands & Harness Selection

Everyday Commands

Everyday CLI commands
Command	Description
`tamandua get-ready`	Install bundled workflows and start dashboard/control plane.
`tamandua workflow run <id> <task>`	Start a run (defaults harness CWD to your current directory).
`tamandua workflow status <query>`	Check run status by run id, prefix, or task substring.
`tamandua workflow runs`	List all runs.
`tamandua workflow resume <run-id>`	Resume a failed run.
`tamandua dashboard`	Start the web dashboard (also starts remote MCP on port 3338).
`tamandua logs-tail`	Follow recent activity as new events arrive.
`tamandua nudge`	Wake all scheduled agents for running runs to poll immediately.
`tamandua update`	Pull the source checkout, rebuild, reinstall workflows, restart services.

Harness Selection

By default, Tamandua uses pi (pi --print) as its agent harness. Override it per run with mutually exclusive flags on tamandua workflow run:

Harness selection flags
Flag	Description
`--pi-as-harness`	Use pi as the agent harness. This is the default.
`--hermes-as-harness`	Use Hermes instead of pi. Alpha quality: very slow, and token accounting is broken. Use pi for production workflows.

To use a custom Hermes binary, set TAMANDUA_HERMES_BINARY; otherwise Tamandua searches for hermes on your PATH. Harness validation runs at scheduling time — a missing or non-executable binary fails the run immediately with a clear error.

Build Your Own

The bundled workflows are starting points. Define your own agents, steps, retry logic, and verification gates in plain YAML and Markdown. If you can write a prompt, you can build a workflow.

id: my-workflow
name: My Custom Workflow
agents:
  - id: researcher
    name: Researcher
    workspace:
      files:
        AGENTS.md: agents/researcher/AGENTS.md

steps:
  - id: research
    agent: researcher
    input: |
      Research {{task}} and report findings.
      Reply with STATUS: done and FINDINGS: ...
    expects: "STATUS: done"

Full guide.

Skill included. The tamandua-agents skill is bundled and is excellent at allowing your agents to build-and-forget with tamandua. The CLI itself is also designed in a way that is easy for agents in general to grasp. Remote MCP tools are exposed so agents can query runs, start workflows, and check status autonomously.

Remote MCP Tools

The remote MCP endpoint exposes 14 tools at http://localhost:3338/mcp:

Run Management

Run Management MCP tools
Tool	Description
`tamandua.runs.list`	List recent Tamandua workflow runs. Accepts optional `limit` (integer, 1–200, default 50).
`tamandua.run.status`	Fetch detailed status for a run. Requires `query` (run id, prefix, or task substring).
`tamandua.run.start`	Start a workflow run. Requires `workflowId` and `taskTitle`.
`tamandua.run.pause`	Pause a running workflow run. Requires `runId`. Optional `drain` (boolean) to wait for in-flight work before pausing.
`tamandua.run.resume`	Resume a paused workflow run. Requires `runId`.
`tamandua.run.delete`	Permanently delete a workflow run and associated steps, stories, and worktree metadata. Requires `runId`. Optional `force` (boolean) cancels and deletes running or paused runs.

tamandua.run.start parameters

tamandua.run.start parameters
Parameter	Required	Description
`workflowId`	Yes	Workflow id to run.
`taskTitle`	Yes	Task description for the workflow run.
`workingDirectoryForHarness`	For direct workflows	Harness working directory for remote MCP runs. Required for direct workflows, invalid for worktree workflows.
`worktreeOriginRepository`	For worktree workflows	Repository path to create the worktree from. Required for worktree workflows, invalid for direct workflows.
`worktreeOriginRef`	No	Git ref (branch, tag, SHA) for the worktree. Only valid for worktree workflows.
`noHurrySaveTokensMode`	No	When `true`, reduces polling frequency to save tokens (15-min floor and default instead of 1-min floor, 5-min default). Defaults to `false`.

workingDirectoryForHarness and worktreeOriginRepository are mutually exclusive: direct workflows require the former, worktree workflows require the latter.

Events & Metadata

Events and Metadata MCP tools
Tool	Description
`tamandua.events.recent`	List recent global Tamandua events. Accepts optional `limit` (integer, 1–500, default 50).
`tamandua.source.path`	Return the local Tamandua source checkout path. No parameters.
`tamandua.skill.path`	Return the path to the bundled tamandua-agents agent skill. No parameters.
`tamandua.update.command`	Return local CLI guidance for updating Tamandua safely. No parameters.

AutoResearch

AutoResearch MCP tools
Tool	Description
`tamandua.autoresearch.init`	Create project-local AutoResearch state. Requires `cwd`, `goal`, `metricName`, `direction`, and `command`. Optional `metricUnit`, `metricRegex`, `checksCommand`, and `overwrite`.
`tamandua.autoresearch.run_experiment`	Run the configured experiment command in `cwd`, parse the metric, run optional checks, and append a `run_result`. Optional `command`, `metricRegex`, `checksCommand`, and `timeoutMs`.
`tamandua.autoresearch.log_experiment`	Append the decision and learning for the latest run. Requires `cwd` and `description`; optional `status`, `metric`, `hypothesis`, `learned`, `nextFocus`, `commit`, and `revertDiscard`.
`tamandua.autoresearch.status`	Summarize baseline, best result, failures, and the next ratchet prompt for `cwd`.

Security

You're installing agent teams that run code on your machine. We take that seriously.

Curated repo only — Tamandua only installs workflows from the official repository. No arbitrary remote sources.
Reviewed for prompt injection — Every workflow is reviewed for prompt injection attacks and malicious agent files before merging.
Community contributions welcome — Want to add a workflow? Submit a PR. All submissions go through careful security review before they ship.
Transparent by default — Every workflow is plain YAML and Markdown. You can read exactly what each agent will do before you install it.

Build your agent team in pi with one command.