Plan and execute tasks with
Agentic Precision

The official documentation for Architect. An open source CLI tool that orchestrates AI agents to write, review, and fix code automatically. Headless-first: designed for CI/CD, pipelines, and automation. Multi-model with built-in security guardrails.

TERMINAL — HUMAN LOG OUTPUT

$ architect run "Refactor the project auth module"

🔄 Step 1 → LLM call (5 messages)

✓ LLM responded with 2 tool calls

🔧 read_file → src/main.py

✓ OK

🔧 edit_file → src/main.py (3→5 lines)

✓ OK

🔍 Hook python-lint: ✓

🔄 Step 2 → LLM call (9 messages)

✓ LLM responded with final text

✅ Agent completed (2 steps)

Reason: LLM decided it was done

Cost: $0.0042

VIEW DOCS GITHUB

What is Architect

A command-line tool that turns any LLM into an autonomous code agent. Give it a task, and architect reads your code, plans changes, implements them, runs tests, and verifies the result — all without human intervention.

Unlike code assistants that live inside an IDE, architect is designed to run where code is actually built: in terminals, scripts, Makefiles, and CI/CD pipelines. It's the missing piece between "I have an AI that generates code" and "I have an AI that delivers verified code".

Works with any LLM: OpenAI, Anthropic, Google, DeepSeek, Mistral, local models with Ollama — over 100 providers supported. You choose the model, architect does the work.

Headless-first

Not a chat with superpowers. It's an automation tool that talks to LLMs.

Determinism over probabilism

Hooks and guardrails are rules, not suggestions. The LLM decides what to do, quality gates verify the result is correct.

Full transparency

Every agent action is logged with human-readable logs and structured JSON. No black boxes.

Open source, no surprises

No subscriptions or locked features. You only pay the API costs of the LLM you choose.

Multi-modelRalph LoopParallel runsGuardrailsYAML PipelinesHooksCI/CD-firstAuto memoryOpenTelemetryReportsDry runVercel Skills

Key Features

Multi-Model, Zero Lock-in

Use any LLM: OpenAI, Anthropic, Google, DeepSeek, Mistral, Ollama — or any compatible API. Over 100 providers via LiteLLM. Switch models with a flag.

Ralph Loop — Autonomous Iteration

Run, verify with your tests, if they fail retry with clean context. Iterate until your checks actually pass — not until the LLM thinks it's done.

Parallel Runs with Worktrees

Multiple agents in parallel, each in its own isolated git worktree. Same task with N models to compare, or different tasks to multiply speed.

Guardrails & Quality Gates

Protected files, blocked commands, change limits, code rules. Mandatory quality gates before completion. Declarative in YAML, deterministic.

Declarative Pipelines

Multi-step YAML workflows. Plan, Build, Test, Review, Fix. Variables between steps, conditions, checkpoints. Composable and reusable.

Extensible with Hooks

10 agent lifecycle events. Auto-format code, block dangerous commands, notify Slack when done. Pre and post on every action.

Built for CI/CD

Semantic exit codes, parseable JSON output, Markdown reports for PR comments. Budget and timeout as hard limits. No confirmations, no interactive prompts.

Learns With Use

Auto-generated procedural memory. Detects corrections and persists them. Combined with .architect.md and Vercel-compatible skills: three layers of accumulated knowledge.

Ralph Loop — The Star Feature

The most productive pattern in agentic coding, built in as a native feature. Instead of trusting the AI to decide when it's done, architect runs your tests and linters after each iteration. If they fail, the agent retries with clean context.

Each iteration starts fresh — no accumulating garbage from previous attempts. It only receives: the original spec, the accumulated diff, and errors from the last iteration. The result: code that compiles, passes tests, and is clean.

$ architect loop "Implement the payments module" \
 --check "pytest tests/ -q" \
 --check "ruff check src/" \
 --max-iterations 25

Ralph Loop — Iteration 3/25

🔄 Agent working (clean context + errors from iter. 2)...

🔧 edit_file → src/payments/stripe.py

🔧 edit_file → tests/test_payments.py

✓ Agent completed

🧪 External verification:

✓ pytest tests/ -q → 18/18 passed

✓ ruff check src/ → no errors

✅ Loop completed in 3 iterations ($0.089)

Guardrails & Quality Gates

Architect's guardrails don't depend on the LLM. They are deterministic rules evaluated before and after every action. The agent can't skip them because it doesn't control them — they're outside its context.

If the agent tries to write to .env → blocked. If the code contains eval() → blocked. If it says "I'm done" but pytest fails → keeps working. Quality gates pass or they don't. No negotiation.

guardrails.yaml

guardrails:

protected_files: [".env", "*.pem", "migrations/*"]

blocked_commands: ['rm -rf /', 'git push --force']

max_files_modified: 30

quality_gates:

- name: lint

command: "ruff check src/"

required: true

- name: tests

command: "pytest tests/ -q"

required: true

code_rules:

- pattern: 'eval\('

severity: block

Parallel Runs with Git Worktrees

Launch multiple agents in parallel, each in its own isolated git worktree. Same task with different models to compare real results on your codebase. Or different tasks in parallel to multiply speed.

Native git worktrees: no copies, no conflicts, no Docker. Each worker operates on an isolated repo snapshot. In the end, objective data — not opinions.

$ architect parallel "Refactor the auth module" \
 --models gpt-4.1,claude-sonnet-4,deepseek-chat

Results — Competitive Eval

Worker 1 (gpt-4.1)

→ architect/parallel-1 ✓ 8 steps $0.034

Worker 2 (claude-sonnet-4)

→ architect/parallel-2 ✓ 5 steps $0.028

Worker 3 (deepseek-chat)

→ architect/parallel-3 ⚡ 20 steps $0.006

$ architect diff parallel-1 parallel-2

$ architect merge parallel-2

Why Architect

architect doesn't compete with Claude Code or Cursor on their turf. It competes where they don't reach: unsupervised execution, CI/CD, automation, scripts.

	Claude Code	Cursor	Aider	architect
Primary mode	Interactive terminal	IDE (VS Code)	Interactive terminal	Headless / CI
Multi-model	Claude only	Multi (with config)	Multi	Multi (LiteLLM, 100+)
Unsupervised	Partial	No	Partial	Native
Parallel runs	Manual (worktrees)	No	No	Native
Ralph Loop	External plugin	No	No	Native
YAML Pipelines	No	No	No	Yes
Guardrails	Hooks (manual)	Limited	No	Declarative (YAML)
Quality Gates	No	No	No	Yes
CI/CD-first	Adaptable	No	Partial	Designed for it
CI exit codes	No	No	No	Semantic
Reports	No	No	No	JSON / MD / GitHub
Native MCP	Yes	No	No	Yes
Post-edit hooks	Manual	Partial	No	Auto & configurable
Self-eval	No	No	No	Basic + Full
Skills ecosystem	Yes	Yes	No	Yes (Vercel-compatible)
Procedural memory	No	No	No	Auto-generated
Session resume	Partial	No	No	Complete
Checkpoints	Interactive	No	Git auto-commits	Programmatic
OpenTelemetry	No	No	No	Native
Cost tracking	Limited	No	Partial	Complete + budget
Custom agents	No	No	No	Yes (YAML)
Open source	No	No	Yes	Yes
Cost	$20/mo (Pro)	$20/mo	API costs	API costs (free)

vs Claude Code: Claude Code is the best interactive terminal agent. architect is the best agent for automation. Claude Code is your copilot; architect is your CI team and autopilot.

vs Cursor: Cursor lives inside the IDE. architect lives where code is built and deployed: in terminals, pipelines, CI, scripts and cron jobs.

vs Aider: Aider pioneered CLI agents. architect takes the idea further: parallel runs, declarative pipelines, guardrails, quality gates, self-evaluation, MCP, and an architecture designed to run unsupervised for hours.

Use Cases

For Developers

Set up a Ralph Loop with your spec and tests. Close the laptop. Next morning you have a PR with code that compiles and passes all tests.

Coding overnight

$ architect loop --spec tasks/payment-module.md \
  --check "pytest tests/ -q" \
  --check "mypy src/" \
  --max-iterations 30

Competitive coding

$ architect parallel "Optimize SQL queries" \
  --models gpt-4.1,claude-sonnet-4,deepseek-chat

Safe refactoring

$ architect run "Migrate from SQLAlchemy sync to async" \
  --dry-run
# Preview first, then execute with checkpoints:
architect run "Migrate from SQLAlchemy sync to async" \
  --checkpoint-every 5

For Teams

Automatic review on every PR, shared standards encoded in YAML, and objective evaluation when switching models.

Automatic PR review

$ architect run "Review this PR" \
  --agent review \
  --context-git-diff origin/main \
  --report github > review.md

Shared standards

$ # .architect.md + guardrails + skills
# Team conventions encoded,
# versioned in git, verified on every run

Model evaluation

$ architect eval \
  --models claude-sonnet-4,gpt-4.1 \
  --tasks eval/tasks.yaml

GitHub Actions — Review

- name: AI Code Review

run: |

architect run "Review the changes in this PR" \

--agent review \

--context-git-diff origin/main \

--report github \

--budget 0.10

For CI/CD & DevOps

Integrate architect in your pipelines. Automatic lint fix, changelog generation, updated docs — all headless, all auditable.

Automatic lint fix

$ architect loop "Fix lint errors" \
  --check "eslint src/ --max-warnings 0" \
  --max-iterations 5 --budget 0.50

Automatic changelog

$ architect run "Generate changelog from v1.2.0" \
  --report markdown > CHANGELOG.md

Automatic docs $ architect pipeline pipelines/update-docs.yaml

Get Started in 30 Seconds

Install

Requires Python 3.12+ pip install architect-ai-cli

Configure

architect init --preset python Or --preset node-react, --preset ci, or manual with export OPENAI_API_KEY=sk-...

Your first task

architect run "Add a GET /health endpoint that returns {status: ok}"

More examples

# Preview without executing (like terraform plan)

$ architect run "Refactor the auth module" --dry-run

# Ralph Loop: iterate until tests pass

$ architect loop "Fix all lint errors" \

--check "ruff check src/" --max-iterations 10

# Parallel: 3 models, same task, compare

$ architect parallel "Optimize the SQL queries" \

--models gpt-4.1,claude-sonnet-4,deepseek-chat

# In CI/CD with budget and report

$ architect run "Review this PR" --agent review \

--budget 0.15 --report github

# Semantic exit codes

# 0 = success, 1 = failed, 2 = partial

Open source. No subscriptions. You only pay the API costs of the LLM you choose.

Use Cases

Plan and execute tasks with Agentic Precision

What is Architect

Headless-first

Determinism over probabilism

Full transparency

Open source, no surprises

Key Features

Multi-Model, Zero Lock-in

Ralph Loop — Autonomous Iteration

Parallel Runs with Worktrees

Guardrails & Quality Gates

Declarative Pipelines

Extensible with Hooks

Built for CI/CD

Learns With Use

Ralph Loop — The Star Feature

Guardrails & Quality Gates

Parallel Runs with Git Worktrees

Why Architect

Use Cases

For Developers

For Teams

For CI/CD & DevOps

Get Started in 30 Seconds

Install

Configure

Your first task

Plan and execute tasks with
Agentic Precision