System Architecture

Component Map

+-------------------------------------------------------------------------+
|  CLI (cli.py)                                                           |
|                                                                         |
|  architect run PROMPT                                                   |
|     |                                                                   |
|     +- 1. GracefulShutdown()          installs SIGINT + SIGTERM         |
|     +- 2. load_config()               YAML -> env -> CLI flags          |
|     +- 3. configure_logging()         logging/setup.py                  |
|     |       +- logging/levels.py      custom HUMAN level (25)           |
|     |       +- logging/human.py       HumanLogHandler + HumanLog        |
|     +- 4. ToolRegistry                                                  |
|     |       +- register_all_tools()   filesystem + editing + search     |
|     |       +- MCPDiscovery()         (optional, --disable-mcp)         |
|     +- 5. RepoIndexer                 workspace tree (F10)              |
|     |       +- IndexCache             disk cache (TTL 5 min)            |
|     +- 6. LLMAdapter(config.llm)      LiteLLM + selective retries       |
|     +- 7. ContextManager(config.ctx)  3-level pruning (F11)             |
|     +- 8. ContextBuilder(repo_index, context_manager)                   |
|     +- 8b. PostEditHooks(config)      core/hooks.py - auto-verification |
|     +- 8c. SessionManager(workspace)  features/sessions.py (v4-B1)      |
|     +- 8d. DryRunTracker()            features/dryrun.py (v4-B4)        |
|     |                                                                   |
|     +- 9a. AgentLoop (default mode: build, or -a flag)                  |
|     |       +- ExecutionEngine(registry, config, confirm_mode,          |
|     |       |                  hooks: PostEditHooks)                     |
|     |       +- while True + safety nets (_check_safety_nets)            |
|     |       +- HumanLog(log) - traceability to stderr                   |
|     |       +- step_timeout (per step) + timeout (total execution)      |
|     |       +- cost_tracker (CostTracker, optional)                     |
|     +- 9b. MixedModeRunner (mixed mode, no longer default)              |
|             +- shared engine (plan + build)                              |
|             +- shared cost_tracker                                       |
|             +- shared ContextManager between phases                      |
|                                                                         |
|    10. SelfEvaluator (optional, --self-eval basic|full, F12)            |
|         +- evaluate_basic() | evaluate_full(run_fn)                     |
|                                                                         |
|    11. ReportGenerator (optional, --report json|markdown|github, B2)    |
|         +- to_json() | to_markdown() | to_github_pr_comment()          |
|                                                                         |
|  == Advanced orchestration modes ==                                     |
|                                                                         |
|    12. RalphLoop (architect loop)                                       |
|         +- agent_factory() -> fresh AgentLoop per iteration             |
|         +- _run_checks() -> subprocess shell commands                   |
|         +- _build_iteration_prompt() -> spec + diff + errors + progress |
|         +- worktree support -> .architect-ralph-worktree                |
|                                                                         |
|    13. PipelineRunner (architect pipeline)                               |
|         +- from_yaml() -> load pipeline from YAML                       |
|         +- agent_factory() -> fresh AgentLoop per step                  |
|         +- _resolve_vars() -> {{variable}} substitution                 |
|         +- _eval_condition() -> skip steps conditionally                |
|         +- _create_checkpoint() -> git commit per step                  |
|                                                                         |
|    14. ParallelRunner (architect parallel)                               |
|         +- ProcessPoolExecutor(max_workers)                             |
|         +- _run_worker_process() -> subprocess architect run in worktree|
|         +- cleanup() -> remove worktrees and branches                   |
|                                                                         |
|    15. AutoReviewer                                                      |
|         +- review_changes(task, diff) -> ReviewResult                   |
|         +- build_fix_prompt() -> correction prompt                      |
|         +- get_recent_diff() -> git diff HEAD                           |
|                                                                         |
|    16. CheckpointManager                                                 |
|         +- create(step) -> git commit with prefix                       |
|         +- list_checkpoints() -> parse git log                          |
|         +- rollback(step|commit) -> git reset --hard                    |
|                                                                         |
|  == Advanced extensions ==                                               |
|                                                                         |
|    17. CompetitiveEval (architect eval)                                  |
|         +- ParallelRunner -> same task with multiple models             |
|         +- _run_checks_in_worktree() -> per-worktree validation         |
|         +- _rank_results() -> composite score (100 pts)                 |
|                                                                         |
|    18. DispatchSubagentTool (tool dispatch_subagent)                    |
|         +- agent_factory() -> fresh AgentLoop for sub-task              |
|         +- types: explore (RO), test (RO+cmd), review (RO)             |
|         +- SUBAGENT_MAX_STEPS=15, truncated summary 1000 chars          |
|                                                                         |
|    19. CodeHealthAnalyzer (--health)                                     |
|         +- take_before_snapshot() -> pre-execution metrics              |
|         +- take_after_snapshot() -> post-execution metrics              |
|         +- compute_delta() -> HealthDelta with markdown report          |
|                                                                         |
|    20. ArchitectTracer (telemetry)                                       |
|         +- start_session() -> full session span                         |
|         +- trace_llm_call() -> span per LLM call                       |
|         +- trace_tool() -> span per tool execution                     |
|         +- NoopTracer if OTel not installed                             |
|                                                                         |
|    21. PresetManager (architect init)                                    |
|         +- apply(preset) -> generates .architect.md + config.yaml       |
|         +- 5 presets: python, node-react, ci, paranoid, yolo            |
+-------------------------------------------------------------------------+

Module Diagram and Dependencies

cli.py
 +-- config/loader.py ---- config/schema.py
 +-- logging/levels.py                          custom HUMAN level (25)
 +-- logging/human.py ---- logging/levels.py    HumanLogHandler + HumanLog
 +-- logging/setup.py ---- logging/levels.py
 |                          logging/human.py (HumanLogHandler)
 +-- tools/setup.py ------ tools/registry.py
 |                          tools/filesystem.py -- tools/base.py
 |                          tools/patch.py         tools/schemas.py
 |                          tools/search.py
 |                          execution/validators.py
 +-- mcp/discovery.py ---- mcp/client.py
 |                          mcp/adapter.py -------- tools/base.py
 +-- indexer/tree.py
 +-- indexer/cache.py
 +-- llm/adapter.py
 +-- core/hooks.py -------- config/schema.py (HookConfig)
 +-- core/context.py ----- indexer/tree.py (RepoIndex)
 |                          llm/adapter.py (LLMAdapter - for maybe_compress)
 +-- core/loop.py -------- core/state.py (AgentState, StopReason)
 |                          core/shutdown.py
 |                          core/timeout.py
 |                          core/context.py (ContextManager)
 |                          core/hooks.py (PostEditHooks - via ExecutionEngine)
 |                          costs/tracker.py (CostTracker, BudgetExceededError)
 |                          logging/human.py (HumanLog)
 +-- core/mixed_mode.py -- core/loop.py
 |                          core/context.py (ContextManager)
 |                          costs/tracker.py (CostTracker)
 +-- core/evaluator.py --- llm/adapter.py (LLMAdapter)
 |                          core/state.py (AgentState) - TYPE_CHECKING only
 +-- features/sessions.py -- core/state.py (StopReason)
 |                            config/schema.py (SessionsConfig)
 +-- features/report.py ---- core/state.py (AgentState)
 |                            costs/tracker.py (CostTracker)
 +-- features/dryrun.py ---- (standalone, minimal deps)
 +-- features/ralph.py ----- core/state.py (AgentState)       # v4-C1
 |                            costs/tracker.py (CostTracker)
 +-- features/pipelines.py -- core/state.py (AgentState)      # v4-C3
 |                             costs/tracker.py (CostTracker)
 +-- features/parallel.py -- (subprocess, standalone)
 +-- features/checkpoints.py - (subprocess git, standalone)
 +-- features/competitive.py -- features/parallel.py (ParallelRunner)
 +-- agents/reviewer.py ---- core/state.py (AgentState)
 +-- tools/dispatch.py ------ tools/base.py (BaseTool)
 |                             core/loop.py (AgentLoop - via factory)
 +-- core/health.py ---------- (AST stdlib + optional radon)
 +-- telemetry/otel.py ------- (optional opentelemetry)
 +-- config/presets.py -------- (standalone, templates)
 +-- agents/registry.py ---- agents/prompts.py
                            config/schema.py (AgentConfig)

Full Execution Flow

Single-agent mode — the default mode (`architect run PROMPT`)

GracefulShutdown()
     |
load_config(yaml, env, cli_flags)
     |
configure_logging()              logging/setup.py
  +- HumanLogHandler (stderr)    HUMAN events only (25)
  +- Technical console (stderr)  controlled by -v / -vv
  +- JSON file (optional)        captures everything (DEBUG+)
     |
ToolRegistry
  +- register_all_tools()    read_file, write_file, delete_file, list_files,
  |                          edit_file, apply_patch, search_code, grep, find_files
  +- MCPDiscovery()          mcp_{server}_{tool} (if MCP servers configured)
     |
RepoIndexer.build_index()    traverses workspace -> RepoIndex
  (or IndexCache.get())       uses cache if < 5 min
     |
LLMAdapter(config.llm)
     |
ContextManager(config.context)
     |
ContextBuilder(repo_index=index, context_manager=ctx_mgr)
     |
PostEditHooks(config.hooks.post_edit, workspace_root)
     |
get_agent("build", yaml_agents, cli_overrides)
  -> AgentConfig{system_prompt, allowed_tools, confirm_mode, max_steps=50}
     |
ExecutionEngine(registry, config, confirm_mode, hooks=post_edit_hooks)
     |
AgentLoop(llm, engine, agent_config, ctx, shutdown, step_timeout,
          context_manager, cost_tracker, timeout)
     |
AgentLoop.run(prompt, stream=True, on_stream_chunk=stderr_write)
     |
     -- while True: --------------------------------------------------------
     |
     |  [1] _check_safety_nets(state, step)
     |        +- USER_INTERRUPT?  -> return immediately (no LLM)
     |        +- MAX_STEPS?       -> _graceful_close() -> asks LLM for summary
     |        +- TIMEOUT?         -> _graceful_close() -> asks LLM for summary
     |        +- BUDGET_EXCEEDED? -> _graceful_close() -> asks LLM for summary
     |        +- CONTEXT_FULL?    -> _graceful_close() -> asks LLM for summary
     |
     |  [2] ContextManager.manage(messages, llm)
     |        +- compresses if > 75% of context window used
     |
     |  [3] hlog.llm_call(step, messages_count)
     |      with StepTimeout(step_timeout):
     |        llm.completion_stream(messages, tools_schema)
     |          -> StreamChunk("def foo...") --> stderr via callback
     |          -> LLMResponse(tool_calls=[ToolCall("edit_file", {...})])
     |
     |  [4] cost_tracker.record(step, model, usage, source="agent")
     |        +- if BudgetExceededError -> _graceful_close(BUDGET_EXCEEDED)
     |
     |  [5] If no tool_calls:
     |        hlog.agent_done(step)
     |        state.status = "success"
     |        state.stop_reason = StopReason.LLM_DONE
     |        break
     |
     |  [6] _execute_tool_calls_batch([tc1, tc2, ...])
     |        if parallel -> ThreadPoolExecutor(max_workers=4)
     |          -> hlog.tool_call("edit_file", {path:...})
     |          -> engine.execute_tool_call("edit_file", {path:..., old_str:..., new_str:...})
     |              1. registry.get("edit_file")
     |              2. tool.validate_args(args)         -> EditFileArgs
     |              3. policy.should_confirm()           -> True: prompt y/n/a
     |              4. if dry_run: return [DRY-RUN]
     |              5. EditFileTool.execute()
     |                   +- validate_path() - workspace confinement
     |                   +- assert old_str is unique
     |                   +- file.write_text(new_content)
     |                   +- return ToolResult(success=True, output="[diff...]")
     |          -> engine.run_post_edit_hooks(tool_name, args)
     |              +- PostEditHooks.run_for_tool() -> hook output appended to result
     |          -> hlog.tool_result("edit_file", success=True)
     |
     |  [7] ctx.append_tool_results(messages, tool_calls, results)
     |        +- ContextManager.truncate_tool_result(content)  <- Level 1
     |      state.steps.append(StepResult(...))
     |
     -- (back to [1]) ------------------------------------------------------
     |
hlog.loop_complete(status, stop_reason, total_steps, total_tool_calls)
state.status = "success" | "partial"  (depending on StopReason)

[Optional] SelfEvaluator (if --self-eval != "off")
     |
     +-- basic: evaluate_basic(prompt, state) -> EvalResult
     |     -> if not passed: state.status = "partial"
     |
     +-- full: evaluate_full(prompt, state, run_fn)
           -> loop up to max_retries: evaluate_basic() + run_fn(correction_prompt)
           -> returns the best AgentState

if --json: stdout <- json.dumps(state.to_output_dict())
if normal: stdout <- state.final_output

[v4-B1] SessionManager.save(session_state)   <- save final session
[v4-B2] if --report: ReportGenerator(report).to_{format}()
        if --report-file: write to file; otherwise, stdout

sys.exit(EXIT_CODE)  <- StopReason -> exit code mapping (0/1/2/3/4/5/130)

Mixed mode (legacy, no longer the default)

[configuration same as single-agent]

MixedModeRunner(llm, engine, plan_config, build_config, ctx,
                shutdown, step_timeout, context_manager, cost_tracker)
     |
     Note: a single shared engine (plan and build). The cost_tracker and
     ContextManager are also shared between phases.
     |
MixedModeRunner.run(prompt, stream=True, on_stream_chunk=...)
     |
     +-- PHASE 1: plan (no streaming)
     |     plan_loop = AgentLoop(llm, engine, plan_config, ctx,
     |                           context_manager=ctx_mgr,
     |                           cost_tracker=cost_tracker)
     |     plan_state = plan_loop.run(prompt, stream=False)
     |     if plan_state.status == "failed": return plan_state
     |     if shutdown.should_stop: return plan_state
     |
     +-- PHASE 2: build (with streaming)
     |     enriched_prompt = f"""
     |       The user asked: {prompt}
     |       The planning agent generated this plan:
     |       ---
     |       {plan_state.final_output}
     |       ---
     |       Your job is to execute this plan step by step...
     |     """
     |     build_loop = AgentLoop(llm, engine, build_config, ctx,
     |                            context_manager=ctx_mgr,
     |                            cost_tracker=cost_tracker)
     |     build_state = build_loop.run(enriched_prompt, stream=True, ...)
     |
     +-- return build_state

[SelfEvaluator is applied to build_state if --self-eval != "off"]

stdout / stderr Separation

This separation is critical for Unix pipe compatibility.

+-----------------------------+------------------------------------------+
| Destination                 | Content                                  |
+-----------------------------+------------------------------------------+
| stderr                      | Real-time LLM streaming chunks           |
| stderr                      | Structured logs (structlog)              |
| stderr                      | Execution header (model, workspace)      |
| stderr                      | MCP and indexer statistics                |
| stderr                      | Confirmation prompts                     |
| stderr                      | Shutdown notices (Ctrl+C)                |
| stderr                      | SelfEvaluator output                     |
| stderr                      | Human log: agent traceability            |
|                             | (Step 1 -> LLM, tool calls, results)    |
+-----------------------------+------------------------------------------+
| stdout                      | Agent's final response                   |
| stdout                      | JSON output (--json)                     |
+-----------------------------+------------------------------------------+

# Example of correct pipe usage:
architect run "analyze the project" -a resume --quiet --json | jq .status
architect run "generate README" --mode yolo > README.md
architect run "..." -v 2>logs.txt    # logs to file, result to stdout

Exit Codes

Code	Constant	Meaning
0	`EXIT_SUCCESS`	Success — agent finished cleanly
1	`EXIT_FAILED`	Agent failure — unrecoverable LLM or tool error
2	`EXIT_PARTIAL`	Partial — did part of the work, didn’t complete (including SelfEvaluator failure)
3	`EXIT_CONFIG_ERROR`	Configuration error or YAML file not found
4	`EXIT_AUTH_ERROR`	LLM authentication error (invalid API key)
5	`EXIT_TIMEOUT`	LLM call timeout
130	`EXIT_INTERRUPTED`	Interrupted by Ctrl+C (POSIX: 128 + SIGINT=2)

Authentication errors (exit 4) and timeouts (exit 5) are detected by keywords in the LiteLLM error message, since LiteLLM can throw various exception types for the same conceptual error.

The SelfEvaluator can change a "success" to "partial" (exit 2) if it detects that the task was not completed correctly.

Design Decisions

Decision	Justification
Sync-first (no asyncio)	Predictable, debuggable; LLM calls are the only latency
No LangChain/LangGraph	The loop is simple (~300 lines); adding abstraction would obscure the flow
Pydantic v2 as source of truth	Validation, serialization, and documentation in one place
Tools never throw exceptions	The agent loop stays stable against any tool failure
Clean stdout	Unix pipes: `architect run ... \| jq .` works without filtering
MCP tools = BaseTool	Unified registry; the agent doesn’t distinguish between local and remote
Selective retries	Only transient errors (rate limit, connection); auth errors fail fast
SIGALRM for timeouts	Per-step, not global; allows resuming on the next step if there’s a timeout
`run_fn` in SelfEvaluator	Avoids circular coupling with AgentLoop; simplifies the evaluator API
Parallel tools with `{future:idx}`	Guarantees correct result order regardless of completion order
ContextManager levels 1->2->3	Progressive: level 1 always active; levels 2 and 3 are more aggressive defenses
`RepoIndexer` with `os.walk()`	Efficient; prunes directories in-place (doesn’t visit them)
`while True` + safety nets	The LLM decides when to stop; the watchdogs are safety, not drivers
`HUMAN` log level (25)	Agent traceability separated from technical noise
`HumanFormatter` with icons	Visual format allows understanding at a glance what the agent is doing
`PostEditHooks`	Post-edit auto-verification without breaking the loop; results go back to the LLM
Graceful close	Watchdogs ask the LLM for a summary instead of cutting (except USER_INTERRUPT)