Plan and execute tasks with
Agentic Precision

The official documentation for Architect. An open source CLI tool that orchestrates AI agents to write, review, and fix code automatically. Headless-first: designed for CI/CD, pipelines, and automation. Multi-model with built-in security guardrails.

TERMINAL — HUMAN LOG OUTPUT
$ architect run "Refactor the project auth module"
🔄 Step 1 → LLM call (5 messages)
LLM responded with 2 tool calls
🔧 read_file → src/main.py
OK
🔧 edit_file → src/main.py (3→5 lines)
OK
🔍 Hook python-lint:
🔄 Step 2 → LLM call (9 messages)
LLM responded with final text
Agent completed (2 steps)
Reason: LLM decided it was done
Cost: $0.0042

What is Architect

A command-line tool that turns any LLM into an autonomous code agent. Give it a task, and architect reads your code, plans changes, implements them, runs tests, and verifies the result — all without human intervention.

Unlike code assistants that live inside an IDE, architect is designed to run where code is actually built: in terminals, scripts, Makefiles, and CI/CD pipelines. It's the missing piece between "I have an AI that generates code" and "I have an AI that delivers verified code".

Works with any LLM: OpenAI, Anthropic, Google, DeepSeek, Mistral, local models with Ollama — over 100 providers supported. You choose the model, architect does the work.

Headless-first

Not a chat with superpowers. It's an automation tool that talks to LLMs.

Determinism over probabilism

Hooks and guardrails are rules, not suggestions. The LLM decides what to do, quality gates verify the result is correct.

Full transparency

Every agent action is logged with human-readable logs and structured JSON. No black boxes.

Open source, no surprises

No subscriptions or locked features. You only pay the API costs of the LLM you choose.

Multi-modelRalph LoopParallel runsGuardrailsYAML PipelinesHooksCI/CD-firstAuto memoryOpenTelemetryReportsDry runVercel Skills

Key Features

Multi-Model, Zero Lock-in

Use any LLM: OpenAI, Anthropic, Google, DeepSeek, Mistral, Ollama — or any compatible API. Over 100 providers via LiteLLM. Switch models with a flag.

Ralph Loop — Autonomous Iteration

Run, verify with your tests, if they fail retry with clean context. Iterate until your checks actually pass — not until the LLM thinks it's done.

Parallel Runs with Worktrees

Multiple agents in parallel, each in its own isolated git worktree. Same task with N models to compare, or different tasks to multiply speed.

Guardrails & Quality Gates

Protected files, blocked commands, change limits, code rules. Mandatory quality gates before completion. Declarative in YAML, deterministic.

Declarative Pipelines

Multi-step YAML workflows. Plan, Build, Test, Review, Fix. Variables between steps, conditions, checkpoints. Composable and reusable.

Extensible with Hooks

10 agent lifecycle events. Auto-format code, block dangerous commands, notify Slack when done. Pre and post on every action.

Built for CI/CD

Semantic exit codes, parseable JSON output, Markdown reports for PR comments. Budget and timeout as hard limits. No confirmations, no interactive prompts.

Learns With Use

Auto-generated procedural memory. Detects corrections and persists them. Combined with .architect.md and Vercel-compatible skills: three layers of accumulated knowledge.

Ralph Loop — The Star Feature

The most productive pattern in agentic coding, built in as a native feature. Instead of trusting the AI to decide when it's done, architect runs your tests and linters after each iteration. If they fail, the agent retries with clean context.

Each iteration starts fresh — no accumulating garbage from previous attempts. It only receives: the original spec, the accumulated diff, and errors from the last iteration. The result: code that compiles, passes tests, and is clean.

$ architect loop "Implement the payments module" \
--check "pytest tests/ -q" \
--check "ruff check src/" \
--max-iterations 25
Ralph Loop — Iteration 3/25
🔄 Agent working (clean context + errors from iter. 2)...
🔧 edit_file → src/payments/stripe.py
🔧 edit_file → tests/test_payments.py
Agent completed
🧪 External verification:
pytest tests/ -q → 18/18 passed
ruff check src/ → no errors
Loop completed in 3 iterations ($0.089)

Guardrails & Quality Gates

Architect's guardrails don't depend on the LLM. They are deterministic rules evaluated before and after every action. The agent can't skip them because it doesn't control them — they're outside its context.

If the agent tries to write to .env → blocked. If the code contains eval() → blocked. If it says "I'm done" but pytest fails → keeps working. Quality gates pass or they don't. No negotiation.

guardrails.yaml
guardrails:
protected_files: [".env", "*.pem", "migrations/*"]
blocked_commands: ['rm -rf /', 'git push --force']
max_files_modified: 30
quality_gates:
- name: lint
command: "ruff check src/"
required: true
- name: tests
command: "pytest tests/ -q"
required: true
code_rules:
- pattern: 'eval\('
severity: block

Parallel Runs with Git Worktrees

Launch multiple agents in parallel, each in its own isolated git worktree. Same task with different models to compare real results on your codebase. Or different tasks in parallel to multiply speed.

Native git worktrees: no copies, no conflicts, no Docker. Each worker operates on an isolated repo snapshot. In the end, objective data — not opinions.

$ architect parallel "Refactor the auth module" \
--models gpt-4.1,claude-sonnet-4,deepseek-chat
Results — Competitive Eval
Worker 1 (gpt-4.1)
→ architect/parallel-1 8 steps $0.034
Worker 2 (claude-sonnet-4)
→ architect/parallel-2 5 steps $0.028
Worker 3 (deepseek-chat)
→ architect/parallel-3 20 steps $0.006
$ architect diff parallel-1 parallel-2
$ architect merge parallel-2

Why Architect

architect doesn't compete with Claude Code or Cursor on their turf. It competes where they don't reach: unsupervised execution, CI/CD, automation, scripts.

Claude Code Cursor Aider architect
Primary mode Interactive terminal IDE (VS Code) Interactive terminal Headless / CI
Multi-model Claude only Multi (with config) Multi Multi (LiteLLM, 100+)
Unsupervised Partial No Partial Native
Parallel runs Manual (worktrees) No No Native
Ralph Loop External plugin No No Native
YAML Pipelines No No No Yes
Guardrails Hooks (manual) Limited No Declarative (YAML)
Quality Gates No No No Yes
CI/CD-first Adaptable No Partial Designed for it
CI exit codes No No No Semantic
Reports No No No JSON / MD / GitHub
Native MCP Yes No No Yes
Post-edit hooks Manual Partial No Auto & configurable
Self-eval No No No Basic + Full
Skills ecosystem Yes Yes No Yes (Vercel-compatible)
Procedural memory No No No Auto-generated
Session resume Partial No No Complete
Checkpoints Interactive No Git auto-commits Programmatic
OpenTelemetry No No No Native
Cost tracking Limited No Partial Complete + budget
Custom agents No No No Yes (YAML)
Open source No No Yes Yes
Cost $20/mo (Pro) $20/mo API costs API costs (free)

vs Claude Code: Claude Code is the best interactive terminal agent. architect is the best agent for automation. Claude Code is your copilot; architect is your CI team and autopilot.

vs Cursor: Cursor lives inside the IDE. architect lives where code is built and deployed: in terminals, pipelines, CI, scripts and cron jobs.

vs Aider: Aider pioneered CLI agents. architect takes the idea further: parallel runs, declarative pipelines, guardrails, quality gates, self-evaluation, MCP, and an architecture designed to run unsupervised for hours.

Use Cases

For Developers

Set up a Ralph Loop with your spec and tests. Close the laptop. Next morning you have a PR with code that compiles and passes all tests.

Coding overnight $ architect loop --spec tasks/payment-module.md \ --check "pytest tests/ -q" \ --check "mypy src/" \ --max-iterations 30
Competitive coding $ architect parallel "Optimize SQL queries" \ --models gpt-4.1,claude-sonnet-4,deepseek-chat
Safe refactoring $ architect run "Migrate from SQLAlchemy sync to async" \ --dry-run # Preview first, then execute with checkpoints: architect run "Migrate from SQLAlchemy sync to async" \ --checkpoint-every 5

For Teams

Automatic review on every PR, shared standards encoded in YAML, and objective evaluation when switching models.

Automatic PR review $ architect run "Review this PR" \ --agent review \ --context-git-diff origin/main \ --report github > review.md
Shared standards $ # .architect.md + guardrails + skills # Team conventions encoded, # versioned in git, verified on every run
Model evaluation $ architect eval \ --models claude-sonnet-4,gpt-4.1 \ --tasks eval/tasks.yaml
GitHub Actions — Review
- name: AI Code Review
run: |
architect run "Review the changes in this PR" \
--agent review \
--context-git-diff origin/main \
--report github \
--budget 0.10

For CI/CD & DevOps

Integrate architect in your pipelines. Automatic lint fix, changelog generation, updated docs — all headless, all auditable.

Automatic lint fix $ architect loop "Fix lint errors" \ --check "eslint src/ --max-warnings 0" \ --max-iterations 5 --budget 0.50
Automatic changelog $ architect run "Generate changelog from v1.2.0" \ --report markdown > CHANGELOG.md
Automatic docs $ architect pipeline pipelines/update-docs.yaml

Get Started in 30 Seconds

1

Install

Requires Python 3.12+ pip install architect-ai-cli
2

Configure

architect init --preset python Or --preset node-react, --preset ci, or manual with export OPENAI_API_KEY=sk-...
3

Your first task

architect run "Add a GET /health endpoint that returns {status: ok}"
More examples
# Preview without executing (like terraform plan)
$ architect run "Refactor the auth module" --dry-run
# Ralph Loop: iterate until tests pass
$ architect loop "Fix all lint errors" \
--check "ruff check src/" --max-iterations 10
# Parallel: 3 models, same task, compare
$ architect parallel "Optimize the SQL queries" \
--models gpt-4.1,claude-sonnet-4,deepseek-chat
# In CI/CD with budget and report
$ architect run "Review this PR" --agent review \
--budget 0.15 --report github
# Semantic exit codes
# 0 = success, 1 = failed, 2 = partial

Open source. No subscriptions. You only pay the API costs of the LLM you choose.