Troubleshooting and Diagnostics

Problem-solving guide for architect-cli v1.0.0. Organized by symptoms: identify the problem, diagnose the cause, and apply the specific solution.

Diagnostic approach

Architect has three main sources of information for diagnosing problems:

HUMAN output (stderr) — the visual log with icons showing what the agent does step by step. Always active except with --quiet or --json.
JSON log (file) — captures ALL events in JSON Lines format. Activated with --log-file. This is the most powerful diagnostic tool.
Technical console (stderr) — technical logs controlled by -v/-vv/-vvv.

Recommended pattern: for any problem, reproduce with --log-file and use jq to filter:

architect run "task" --log-file debug.jsonl -vv
cat debug.jsonl | jq 'select(.event == "agent.tool_call.execute")'

1. Connection and LLM errors

1.1 Authentication error (exit code 4)

Symptom: the agent terminates immediately with exit code 4 and the message Authentication failed or Invalid API key.

Cause: the API key is not configured, is invalid, or has expired.

Solution:

# Verify that the environment variable is defined
echo $LITELLM_API_KEY

# Or use the OpenAI/Anthropic key directly
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

# Pass via CLI (single execution)
architect run "task" --api-key "sk-..."

# Verify in the YAML config that api_key_env points to the correct variable
# .architect.yaml
llm:
  api_key_env: "OPENAI_API_KEY"  # name of the env var

If you use a proxy or local server, also verify --api-base.

1.2 LLM call timeout

Symptom: HUMAN output shows Error del LLM: timeout (icon ---) or the JSON log has event: "agent.llm_error" with an error containing “timeout” or “timed out”.

Cause: the default LLM timeout is 60 seconds (llm.timeout: 60). Large models or very long prompts may take longer. Slow connection to the provider.

Solution:

# .architect.yaml
llm:
  timeout: 120   # increase to 120 seconds
  retries: 3     # increase retries (default: 2)

# Diagnose with detailed logging
architect run "task" --log-file debug.jsonl -vvv
cat debug.jsonl | jq 'select(.event | startswith("agent.llm"))'

1.3 Model not found

Symptom: error Model not found or Invalid model at startup. Exit code 3 (config error).

Cause: the model name does not exist for the configured provider, or the provider does not support that model.

Solution:

# Verify that the model is valid for the provider
# OpenAI: gpt-4o, gpt-4o-mini, gpt-4.1, etc.
# Anthropic: claude-sonnet-4-6, claude-opus-4-6, etc.
# For models via LiteLLM proxy, use prefix: openai/gpt-4o, anthropic/claude-sonnet-4-6

architect run "task" --model gpt-4o
architect run "task" --model anthropic/claude-sonnet-4-6

# .architect.yaml
llm:
  model: "gpt-4o"         # exact model name
  api_base: null           # null to use the direct provider

1.4 Rate limiting (429)

Symptom: JSON log shows repeated HTTP 429 errors. The agent may recover automatically thanks to retries, but if the issue persists, it stops with LLM_ERROR.

Cause: too many requests to the provider in a short time. Common in parallel executions or with low-quota models.

Solution:

# .architect.yaml
llm:
  retries: 3           # increase retries with backoff
  timeout: 120         # allow more time for backoff to work

# In parallel executions, reduce workers
architect parallel --workers 2 --task "..."

# Check quota in the provider dashboard
# OpenAI: platform.openai.com/usage
# Anthropic: console.anthropic.com

1.5 Incorrect API base

Symptom: error Connection refused or Could not resolve host. The agent cannot connect to the LLM.

Cause: api_base points to a nonexistent server, an unreachable server, or uses an incorrect protocol.

Solution:

# Verify that the server responds
curl https://your-server.com/v1/models

# Fix in the configuration
architect run "task" --api-base "https://your-server.com/v1"

# .architect.yaml
llm:
  api_base: "https://your-server.com/v1"
  mode: "proxy"    # use "proxy" if it is a LiteLLM server or OpenAI-compatible

2. The agent does not finish / infinite loops

2.1 max_steps too high or not configured

Symptom: the agent executes dozens or hundreds of steps without finishing. The HUMAN output shows Step 50, Step 51… endlessly.

Cause: max_steps defaults to 50 for the build agent (20 for plan and review, 15 for resume). If the task is ambiguous, the LLM may not find a stopping point.

Solution:

# .architect.yaml -- limit steps
agents:
  build:
    max_steps: 30    # reasonable cap

# Also use budget and timeout as complementary safety nets
costs:
  budget_usd: 2.00   # max $2 per execution

# From CLI
architect run "task" --max-steps 25 --budget 1.50 --timeout 300

2.2 No safety nets configured

Symptom: the agent runs indefinitely consuming tokens and money. There are no safety.* messages in the logs.

Cause: no budget limits, timeout, or adequate max_steps were configured.

Solution: always configure all three safety nets:

# .architect.yaml -- defensive configuration
agents:
  build:
    max_steps: 30

costs:
  budget_usd: 5.00
  warn_at_usd: 3.00

# Timeout from CLI (there is no YAML config for global timeout, it is passed as a flag)

architect run "task" --max-steps 30 --budget 5.00 --timeout 600

2.3 Hooks failing repeatedly cause loops

Symptom: the agent repeats the same step over and over. The HUMAN output shows Hook name: (warning) repeatedly. The agent tries to correct, the hook fails again, and so on.

Cause: a post_tool_use hook or a quality gate fails consistently, the LLM receives the error as feedback and tries to correct it, but the correction also fails the hook.

Solution:

# Diagnose: see which hooks are failing
cat debug.jsonl | jq 'select(.event == "agent.hook.complete" and .success == false)'

# Verify the hook manually
echo '{}' | ARCHITECT_EVENT=post_tool_use ARCHITECT_TOOL_NAME=edit_file bash -c 'your-hook-command'
echo $?  # should be 0 (ALLOW) or 2 (BLOCK)

# Temporarily disable the problematic hook
hooks:
  post_tool_use:
    - name: "my-hook"
      command: "..."
      enabled: false    # <-- disable

Hooks NEVER break the loop (errors return ALLOW), but if a required quality gate fails repeatedly, the agent keeps trying. Verify that the quality gates are achievable:

guardrails:
  quality_gates:
    - name: "tests"
      command: "pytest tests/ -x"
      required: true     # change to false if it blocks
      timeout: 60

2.4 Context window filling up

Symptom: HUMAN output shows Compressing context -- N exchanges and Context window: removed N messages. The agent becomes slow. May terminate with StopReason: CONTEXT_FULL.

Cause: the task is very long, tool responses are too large, or the context management configuration is insufficient.

Solution:

# .architect.yaml -- aggressive context management
context:
  max_tool_result_tokens: 1500     # truncate large results
  summarize_after_steps: 6         # compress sooner
  keep_recent_steps: 3             # keep fewer steps
  max_context_tokens: 60000        # hard limit

# Use a model with a larger context
llm:
  model: "gpt-4o"  # 128k context

2.5 Quality gates contradict the task

Symptom: the agent completes the task but quality gates fail, so the agent tries to “fix” the code and breaks what it had done. It repeats in a loop.

Cause: a quality gate (lint, tests, typecheck) fails for reasons unrelated to the current task, but the agent receives the error and tries to correct it.

Solution:

guardrails:
  quality_gates:
    - name: "lint"
      command: "ruff check src/ --select E,W"  # be specific about which rules
      required: false   # do not block the agent
      timeout: 30

    - name: "tests-related"
      command: "pytest tests/test_specific.py -x"  # only relevant tests
      required: true
      timeout: 120

3. The agent produces incorrect results

3.1 Prompt too vague or ambiguous

Symptom: the agent completes (exit code 0, StopReason: LLM_DONE) but the result is not what was expected. It makes changes to incorrect files or generates irrelevant code.

Cause: the prompt is not specific enough. The agent infers the intent incorrectly.

Solution:

# Be explicit about what to do, where, and how
architect run "In src/auth/login.py, refactor the validate_token() function \
  to use pyjwt instead of jose. Keep the same public interface. \
  Update the tests in tests/test_auth.py"

# For complex tasks, use a heredoc or file via shell
architect run "$(cat spec.md)"

3.2 Incorrect agent selected

Symptom: the agent plans instead of building, or builds without planning a complex task.

Cause: the default agent is build. The task may require plan (for large tasks) or review (for code review).

Solution:

# Use an explicit agent
architect run "..." --agent plan      # planning
architect run "..." --agent build     # building (default)
architect run "..." --agent review    # code review
architect run "..." --agent resume    # resume an interrupted task

3.3 Missing .architect.md in the project

Symptom: the agent does not follow the project conventions. It uses tabs instead of spaces, imports disallowed libraries, does not follow the architecture pattern.

Cause: there is no .architect.md file in the project root to tell the agent about the conventions. The agent uses its own defaults.

Solution: create .architect.md in the workspace root with the conventions:

# Project Conventions

- Python 3.12+, use strict typing
- Format: black (100 chars), ruff for linting
- Tests with pytest, minimum 80% coverage
- Do not use print(), always use structlog
- Absolute imports, never relative

3.4 Model too weak for the task

Symptom: the agent completes but the code has obvious bugs, does not compile, or ignores clear instructions from the prompt.

Cause: small models (gpt-4o-mini, claude-haiku) may not be sufficient for complex refactoring or architecture tasks.

Solution:

# Use a more capable model
architect run "complex task" --model gpt-4o
architect run "complex task" --model anthropic/claude-sonnet-4-6

3.5 Context too large causes hallucinations

Symptom: the agent mixes content from different files, invents functions that do not exist, or references code that was removed by context compression.

Cause: when the context approaches the limit, models can lose precision. Context compression may remove relevant information.

Solution:

# Be more aggressive with truncation to maintain precision
context:
  max_tool_result_tokens: 1000   # less content per tool result
  keep_recent_steps: 5           # keep more recent steps intact
  summarize_after_steps: 5       # compress sooner

# Split the task into smaller steps
# Or use pipelines to sequence sub-tasks

# Use pipeline for large tasks
architect pipeline workflow.yaml

4. Tool errors

4.1 Path traversal blocked

Symptom: HUMAN output shows ERROR: Path validation failed or Path outside workspace. The tool result contains an error about path traversal.

Cause: the agent attempts to access a file outside the workspace_root. All filesystem operations validate that the path is within the workspace.

Solution:

# Verify that the workspace is correct
architect run "task" --workspace /path/to/project

# If you need to access files outside the workspace, adjust the workspace root
architect run "task" --workspace /parent/path

# .architect.yaml
workspace:
  root: "."   # relative to the execution directory

4.2 Tool not available for the agent

Symptom: JSON log shows tool_not_found or Tool 'X' not found in registry. The agent tries to use a tool that is not assigned to it.

Cause: each agent has an allowed_tools list. If the tool is not in the list, it cannot use it. The review agent only has read-only tools.

Solution:

# .architect.yaml -- assign tools to the agent
agents:
  build:
    allowed_tools:
      - read_file
      - write_file
      - edit_file
      - apply_patch
      - search_code
      - grep
      - find_files
      - run_command
      - dispatch_subagent

# View available tools with verbose
architect run "task" -v --log-file debug.jsonl
cat debug.jsonl | jq 'select(.event | contains("tool")) | .tool'

4.3 edit_file: old_str is not unique

Symptom: tool result contains error old_str not found or old_str matches multiple locations. The edit fails.

Cause: edit_file uses exact string replacement. If old_str appears more than once or does not exist exactly as passed, it fails.

Solution: the agent itself resolves this, but if it occurs repeatedly:

# Verify the exact file content
cat -A file.py  # shows tabs and spaces

# The agent should use a longer and unique old_str
# If it persists, instruct the agent to use apply_patch instead of edit_file
architect run "Use apply_patch instead of edit_file for changes in file.py"

4.4 apply_patch: context does not match

Symptom: tool result contains patch failed or context mismatch. The patch cannot be applied.

Cause: the context lines of the unified diff do not match the current file content. The file was modified between when the agent read it and generated the patch.

Solution: the agent normally retries by reading the file again. If it persists:

# Diagnose with the log
cat debug.jsonl | jq 'select(.tool == "apply_patch") | {args: .args, error: .error}'

The agent should use read_file before apply_patch to get the updated content.

4.5 run_command blocked or timeout

Symptom: tool result contains Command blocked (command in the blocked list) or Command timed out after Ns.

Cause: the command matches a blocked pattern (built-in or custom) or exceeds the timeout.

Solution:

# .architect.yaml
commands:
  enabled: true
  default_timeout: 60       # increase timeout (default: 30)
  max_output_lines: 500     # increase output (default: 200)

  # Add safe commands
  safe_commands:
    - "npm test"
    - "cargo build"

  # Add additional blocked patterns
  blocked_patterns:
    - "docker rm"

  # Only allow safe/dev commands (restrictive mode)
  allowed_only: false   # true = safe+dev only

4.6 delete_file not allowed

Symptom: tool result contains Delete not allowed or File deletion disabled.

Cause: by default, allow_delete is disabled in the workspace configuration.

Solution:

# .architect.yaml
workspace:
  allow_delete: true   # allow file deletion

5. Hook and guardrail problems

5.1 Hook timeout

Symptom: log shows hook.timeout with the hook name. The hook is ignored (returns ALLOW by default).

Cause: the hook takes longer than its configured timeout (default: 10 seconds).

Solution:

hooks:
  post_tool_use:
    - name: "my-linter"
      command: "ruff check --fix $ARCHITECT_FILE_PATH"
      timeout: 30   # increase (default: 10, max: 300)

# Verify how long the hook takes manually
time ruff check --fix src/main.py

5.2 Hook blocks unexpectedly

Symptom: HUMAN output shows Hook name: (warning). The agent receives a block message from the hook but it should not. The tool call is not executed.

Cause: a pre-hook returns exit code 2 (BLOCK) when it should not. The hook’s stderr contains the block reason.

Solution:

# Run the hook manually to see what happens
export ARCHITECT_EVENT=pre_tool_use
export ARCHITECT_TOOL_NAME=edit_file
export ARCHITECT_WORKSPACE=$(pwd)
echo '{"path": "src/main.py"}' | bash -c 'your-hook-command'
echo "Exit code: $?"  # 0=ALLOW, 2=BLOCK

# Check in the JSON log
cat debug.jsonl | jq 'select(.event == "hook.error" or .event == "agent.hook.complete")'

Hook exit code protocol:

Exit 0 = ALLOW (permit the action)
Exit 2 = BLOCK (block, stderr = reason)
Other = Hook error (logged as WARNING, does not block)

5.3 Guardrail blocks file access

Symptom: tool result contains File protected by guardrail: X (pattern: Y).

Cause: the file matches a pattern in guardrails.protected_files.

Solution:

guardrails:
  enabled: true
  protected_files:
    - ".env"
    - "*.pem"
    - "*.key"
    - "secrets.*"
    # Verify that there are no overly broad patterns
    # For example "*.json" would block ALL JSON files

# View which files are protected
cat debug.jsonl | jq 'select(.event == "guardrail.file_blocked")'

5.4 Code rules block edits

Symptom: the agent writes code but receives a warning or block with the message from a code rule. The log shows guardrail.code_rule_violation.

Cause: the content written by the agent matches a regex pattern of a code rule with severity block.

Solution:

guardrails:
  code_rules:
    - pattern: "import os\\.system"
      message: "Use subprocess instead of os.system"
      severity: "warn"     # "warn" attaches a warning, "block" prevents write

    - pattern: "TODO|FIXME|HACK"
      message: "Do not leave TODOs in the code"
      severity: "warn"     # change from "block" to "warn" if too strict

5.5 Modified files or lines limit reached

Symptom: tool result contains Modified files limit reached or Changed lines limit reached.

Cause: the guardrail max_files_modified or max_lines_changed has been reached during the session.

Solution:

guardrails:
  max_files_modified: 20    # increase or set to null for no limit
  max_lines_changed: 2000   # increase or set to null
  max_commands_executed: 50  # increase or set to null

6. Advanced feature problems

6.1 Sessions: cannot resume

Symptom: architect resume <id> shows session not found or loads a corrupted session.

Cause: the session does not exist in .architect/sessions/, the JSON file is corrupted, or the session was automatically cleaned up.

Solution:

# List available sessions
architect sessions

# Verify that the directory exists
ls -la .architect/sessions/

# If the session was cleaned up, check the cleanup configuration

# .architect.yaml -- keep sessions longer
sessions:
  auto_save: true
  cleanup_after_days: 30   # default: 7 days

Note: if a session has more than 50 messages, it is truncated to the 30 most recent when saved. This may affect resume if important context was lost.

6.2 Ralph Loop: never converges

Symptom: the Ralph Loop executes all iterations without the checks passing. The .architect/ralph-progress.md file shows FAIL on all iterations.

Cause: the checks are too strict, the task is too complex for a single iteration, or the agent does not receive enough context from previous errors.

Solution:

# Review the progress
cat .architect/ralph-progress.md

# Verify that the checks work with the current code
pytest tests/ -x          # run the check manually
ruff check src/           # run the check manually

# Use more conservative options
architect loop "task" \
  --check "pytest tests/test_specific.py -x" \
  --max-iterations 10 \
  --max-cost 5.00 \
  --model gpt-4o

Common causes of non-convergence:

The check fails for reasons unrelated to the task (pre-existing broken tests).
The agent does not include the COMPLETE tag in its response (required to converge).
The task requires changes across multiple files that the agent cannot resolve in a single iteration.
The check timeout (120s) is insufficient for large test suites.

6.3 Parallel: worktree conflicts

Symptom: error Error creating worktree when starting parallel execution. Or worktrees remain orphaned after an interrupted execution.

Cause: worktrees from previous executions were not cleaned up. Git does not allow creating a worktree if the branch already exists or the directory is occupied.

Solution:

# Clean up worktrees and branches from previous executions
architect parallel-cleanup

# Clean up manually if the command fails
git worktree list                            # view active worktrees
git worktree remove .architect-parallel-1 --force
git worktree remove .architect-parallel-2 --force
git worktree prune                           # clean up orphans
git branch -D architect/parallel-1           # delete branches
git branch -D architect/parallel-2

6.4 Pipeline: variables are not resolved

Symptom: the pipeline prompt literally contains {{variable}} instead of the expected value. The agent receives the unresolved template.

Cause: the variable is not defined in the pipeline YAML or in the --var CLI flags. Undefined variables are left as-is (not resolved).

Solution:

# pipeline.yaml
name: my-pipeline
variables:
  target_dir: "src/"           # define default value
  test_command: "pytest"

steps:
  - name: build
    prompt: "Build in {{target_dir}}"  # resolves to "src/"

# Pass variables from CLI (override YAML values)
architect pipeline pipeline.yaml --var target_dir=lib/ --var test_command="npm test"

# Verify resolution with dry-run
architect pipeline pipeline.yaml --dry-run

6.5 Checkpoints: not being created

Symptom: architect history does not show checkpoints. The log has no checkpoint.created events.

Cause: checkpoints are not enabled in config, there are no changes to commit (clean git status), or git is not initialized.

Solution:

# .architect.yaml
checkpoints:
  enabled: true
  every_n_steps: 5   # create checkpoint every 5 steps

# Verify that there is a git repository
git status

# Verify that there are changes to commit
git status --porcelain

# Search for existing checkpoints manually
git log --oneline --grep="architect:checkpoint"

Note: checkpoints are git commits with the prefix architect:checkpoint. If the workspace has no staged changes, no commit is created. If git add -A does not capture anything new, the checkpoint is silently skipped.

6.6 Auto-review: does not detect issues

Symptom: the auto-review always reports “No issues found” even though there are obvious problems.

Cause: the diff is too large (truncated to 8000 characters), the reviewer does not have enough context, or the reviewer model is too weak.

Solution:

# .architect.yaml
auto_review:
  enabled: true
  review_model: "gpt-4o"       # use a capable model for review
  max_fix_passes: 2             # try to fix up to 2 times

7. CI/CD problems

7.1 No TTY for confirmation mode

Symptom: error NoTTYError or Cannot confirm: no TTY available. Exit code 1.

Cause: in CI/CD there is no interactive terminal. The confirmation mode confirm-all or confirm-sensitive requires user input.

Solution:

# Use yolo mode (no confirmation) in CI
architect run "task" --confirm-mode yolo

# Or the short alias
architect run "task" -m yolo

# .architect.yaml for CI
agents:
  build:
    confirm_mode: "yolo"

7.2 Exit codes in CI pipelines

Symptom: the CI pipeline fails or passes when it should not. The Architect exit codes are not interpreted correctly.

Cause: Architect uses specific exit codes that CI does not distinguish.

Solution: handle exit codes explicitly:

# In GitHub Actions / shell script
architect run "task" --confirm-mode yolo --json --budget 5.00
EXIT_CODE=$?

case $EXIT_CODE in
  0) echo "Full success" ;;
  1) echo "Failed" ; exit 1 ;;
  2) echo "Partial — review output" ;;
  3) echo "Configuration error" ; exit 1 ;;
  4) echo "Authentication error" ; exit 1 ;;
  5) echo "Timeout" ; exit 1 ;;
  130) echo "Interrupted" ; exit 1 ;;
esac

# Use --exit-code-on-partial to treat partial as error
architect run "task" --confirm-mode yolo --exit-code-on-partial
# Now exit code 2 (partial) becomes exit code 1 (failed)

7.3 JSON output: incorrect parsing

Symptom: CI tries to parse the JSON output but fails. The JSON is mixed with logs or is incomplete.

Cause: without --json, the result goes to stdout but HUMAN logs go to stderr. If CI captures both streams, they get mixed. Or the agent is interrupted before generating complete JSON.

Solution:

# Ensure clean JSON output
architect run "task" --json --quiet 2>/dev/null > result.json

# --json: JSON output to stdout
# --quiet: suppress HUMAN logs on stderr
# 2>/dev/null: suppress all stderr

# Parse with jq
cat result.json | jq '.status'
cat result.json | jq '.costs.total_cost_usd'

7.4 Budget exhausted in CI

Symptom: the agent terminates with StopReason: BUDGET_EXCEEDED, exit code 2 (partial). The task remains incomplete.

Cause: the configured budget is insufficient for the task complexity. Larger models consume more tokens.

Solution:

# Increase budget
architect run "task" --budget 10.00 --confirm-mode yolo

# Use prompt caching to reduce costs

# .architect.yaml
costs:
  budget_usd: 10.00
  warn_at_usd: 7.00

llm:
  prompt_caching: true   # reduces cost 50-90% on repeated calls

# Monitor costs in CI
architect run "task" --json --confirm-mode yolo > result.json
COST=$(cat result.json | jq '.costs.total_cost_usd // 0')
echo "Execution cost: $${COST}"

7.5 MCP server not accessible

Symptom: log shows MCP connection errors. The MCP tools are not registered. The agent works but without the remote tools.

Cause: the MCP server is not accessible from the CI environment, the token has expired, or the URL is incorrect.

Solution:

# .architect.yaml
mcp:
  servers:
    - name: "docs"
      url: "https://mcp-server.example.com"
      token_env: "MCP_DOCS_TOKEN"   # env var with the token

# Verify connectivity
curl -v https://mcp-server.example.com

# Verify that the token is configured
echo $MCP_DOCS_TOKEN

# In CI, configure as a secret
# GitHub Actions:
# env:
#   MCP_DOCS_TOKEN: ${{ secrets.MCP_DOCS_TOKEN }}

8. Diagnostics with logging

8.1 Capture the complete log

# Capture EVERYTHING (JSON debug + verbose console)
architect run "task" --log-file session.jsonl -vvv

The session.jsonl file contains each event as a JSON line. This includes LLM calls, tool calls, results, hooks, safety nets, and more.

8.2 Useful queries with jq

# View all executed tool calls
cat session.jsonl | jq 'select(.event == "agent.tool_call.execute") | {tool: .tool, args: .args}'

# View only tool errors
cat session.jsonl | jq 'select(.event == "agent.tool_call.complete" and .success == false) | {tool: .tool, error: .error}'

# View LLM calls and message count
cat session.jsonl | jq 'select(.event == "agent.llm.call") | {step: .step, messages: .messages_count}'

# View all safety net triggers
cat session.jsonl | jq 'select(.event | startswith("safety."))'

# View costs per step
cat session.jsonl | jq 'select(.event == "cost_tracker.record") | {step: .step, model: .model, cost: .cost_usd, tokens_in: .input_tokens, tokens_out: .output_tokens}'

# View hook events
cat session.jsonl | jq 'select(.event | startswith("hook."))'

# View guardrail events
cat session.jsonl | jq 'select(.event | startswith("guardrail."))'

# View context compression
cat session.jsonl | jq 'select(.event | startswith("context."))'

# Extract the final stop_reason
cat session.jsonl | jq 'select(.event == "agent.loop.complete") | {status: .status, stop_reason: .stop_reason, steps: .total_steps}'

# View LLM errors
cat session.jsonl | jq 'select(.event == "agent.llm_error") | .error'

# Quick summary: count of each event type
cat session.jsonl | jq -r '.event' | sort | uniq -c | sort -rn

8.3 Reading the HUMAN output (icons)

The HUMAN output uses icons to indicate the event type:

Icon	Meaning
🔄	Agent step N: LLM call / closing
✓	Successful LLM response or tool OK
🔧	Local tool execution
🌐	MCP tool execution (remote)
🔍	Hook result
✅	Agent completed successfully
⚡	Agent stopped (partial or failure)
⚠️	Safety net activated or warning
❌	LLM error
📦	Context compression/management

8.4 Verbose levels (-v/-vv/-vvv)

Flag	Console level	What it shows
(none)	WARNING	Only HUMAN output (agent steps) + severe errors
`-v`	INFO	HUMAN + system operations: config loaded, tools registered, indexer
`-vv`	DEBUG	HUMAN + technical detail: complete args, LLM responses, timing
`-vvv`	DEBUG	HUMAN + EVERYTHING: HTTP requests, complete payloads

HUMAN logs are shown always (except --quiet/--json), regardless of -v.

# For development/debug, use -vv
architect run "task" -vv --log-file debug.jsonl

# For CI, use --quiet or --json
architect run "task" --json --quiet --confirm-mode yolo

9. Quick exit code table

Exit Code	Name	Description	Typical StopReason
0	SUCCESS	Task completed successfully	`LLM_DONE`
1	FAILED	Task failed (unrecoverable error)	`LLM_ERROR`
2	PARTIAL	Task partially completed	`MAX_STEPS`, `BUDGET_EXCEEDED`, `CONTEXT_FULL`, `TIMEOUT`
3	CONFIG_ERROR	Error in the YAML configuration or flags	—
4	AUTH_ERROR	Authentication failure with the LLM	—
5	TIMEOUT	Global execution timeout	`TIMEOUT`
130	INTERRUPTED	Ctrl+C or SIGTERM	`USER_INTERRUPT`

StopReason table

StopReason	Type	Description	Recommended action
`LLM_DONE`	Natural	The LLM decided it was done (did not request more tools)	Verify that the result is correct
`MAX_STEPS`	Safety net	The step limit was reached	Increase `max_steps` or simplify the task
`BUDGET_EXCEEDED`	Safety net	The USD budget was exceeded	Increase `budget_usd` or use a cheaper model
`CONTEXT_FULL`	Safety net	The context window was filled	Adjust `context` config or split the task
`TIMEOUT`	Safety net	The time limit was exceeded	Increase `--timeout` or simplify the task
`USER_INTERRUPT`	Manual	The user pressed Ctrl+C / sent SIGTERM	The agent attempts a graceful shutdown and resume
`LLM_ERROR`	Error	Unrecoverable LLM error (after retries)	Verify API key, model, connectivity

10. Quick diagnostic checklist

For any problem, follow this order:

Check exit code: echo $? after execution.
Read HUMAN output: look for the last warning/error icon.
Review with verbose: repeat with -vv.
Capture JSON log: repeat with --log-file debug.jsonl.
Filter with jq: use the queries from section 8.2.
Verify config: architect run --dry-run "test" -v to see which config is loaded.
Test hooks manually: run the hook commands outside of Architect.
Review .architect.yaml: validate with python -c "from architect.config.loader import load_config; load_config('.')".