System Architecture

Component Map

+-------------------------------------------------------------------------+
|  CLI (cli.py)                                                           |
|                                                                         |
|  architect run PROMPT                                                   |
|     |                                                                   |
|     +- 1. GracefulShutdown()          installs SIGINT + SIGTERM         |
|     +- 2. load_config()               YAML -> env -> CLI flags          |
|     +- 3. configure_logging()         logging/setup.py                  |
|     |       +- logging/levels.py      custom HUMAN level (25)           |
|     |       +- logging/human.py       HumanLogHandler + HumanLog        |
|     +- 4. ToolRegistry                                                  |
|     |       +- register_all_tools()   filesystem + editing + search     |
|     |       +- MCPDiscovery()         (optional, --disable-mcp)         |
|     +- 5. RepoIndexer                 workspace tree (F10)              |
|     |       +- IndexCache             disk cache (TTL 5 min)            |
|     +- 6. LLMAdapter(config.llm)      LiteLLM + selective retries       |
|     +- 7. ContextManager(config.ctx)  3-level pruning (F11)             |
|     +- 8. ContextBuilder(repo_index, context_manager)                   |
|     +- 8b. PostEditHooks(config)      core/hooks.py - auto-verification |
|     +- 8c. SessionManager(workspace)  features/sessions.py (v4-B1)      |
|     +- 8d. DryRunTracker()            features/dryrun.py (v4-B4)        |
|     |                                                                   |
|     +- 9a. AgentLoop (default mode: build, or -a flag)                  |
|     |       +- ExecutionEngine(registry, config, confirm_mode,          |
|     |       |                  hooks: PostEditHooks)                     |
|     |       +- while True + safety nets (_check_safety_nets)            |
|     |       +- HumanLog(log) - traceability to stderr                   |
|     |       +- step_timeout (per step) + timeout (total execution)      |
|     |       +- cost_tracker (CostTracker, optional)                     |
|     +- 9b. MixedModeRunner (mixed mode, no longer default)              |
|             +- shared engine (plan + build)                              |
|             +- shared cost_tracker                                       |
|             +- shared ContextManager between phases                      |
|                                                                         |
|    10. SelfEvaluator (optional, --self-eval basic|full, F12)            |
|         +- evaluate_basic() | evaluate_full(run_fn)                     |
|                                                                         |
|    11. ReportGenerator (optional, --report json|markdown|github, B2)    |
|         +- to_json() | to_markdown() | to_github_pr_comment()          |
|                                                                         |
|  == Advanced orchestration modes ==                                     |
|                                                                         |
|    12. RalphLoop (architect loop)                                       |
|         +- agent_factory() -> fresh AgentLoop per iteration             |
|         +- _run_checks() -> subprocess shell commands                   |
|         +- _build_iteration_prompt() -> spec + diff + errors + progress |
|         +- worktree support -> .architect-ralph-worktree                |
|                                                                         |
|    13. PipelineRunner (architect pipeline)                               |
|         +- from_yaml() -> load pipeline from YAML                       |
|         +- agent_factory() -> fresh AgentLoop per step                  |
|         +- _resolve_vars() -> {{variable}} substitution                 |
|         +- _eval_condition() -> skip steps conditionally                |
|         +- _create_checkpoint() -> git commit per step                  |
|                                                                         |
|    14. ParallelRunner (architect parallel)                               |
|         +- ProcessPoolExecutor(max_workers)                             |
|         +- _run_worker_process() -> subprocess architect run in worktree|
|         +- cleanup() -> remove worktrees and branches                   |
|                                                                         |
|    15. AutoReviewer                                                      |
|         +- review_changes(task, diff) -> ReviewResult                   |
|         +- build_fix_prompt() -> correction prompt                      |
|         +- get_recent_diff() -> git diff HEAD                           |
|                                                                         |
|    16. CheckpointManager                                                 |
|         +- create(step) -> git commit with prefix                       |
|         +- list_checkpoints() -> parse git log                          |
|         +- rollback(step|commit) -> git reset --hard                    |
|                                                                         |
|  == Advanced extensions ==                                               |
|                                                                         |
|    17. CompetitiveEval (architect eval)                                  |
|         +- ParallelRunner -> same task with multiple models             |
|         +- _run_checks_in_worktree() -> per-worktree validation         |
|         +- _rank_results() -> composite score (100 pts)                 |
|                                                                         |
|    18. DispatchSubagentTool (tool dispatch_subagent)                    |
|         +- agent_factory() -> fresh AgentLoop for sub-task              |
|         +- types: explore (RO), test (RO+cmd), review (RO)             |
|         +- SUBAGENT_MAX_STEPS=15, truncated summary 1000 chars          |
|                                                                         |
|    19. CodeHealthAnalyzer (--health)                                     |
|         +- take_before_snapshot() -> pre-execution metrics              |
|         +- take_after_snapshot() -> post-execution metrics              |
|         +- compute_delta() -> HealthDelta with markdown report          |
|                                                                         |
|    20. ArchitectTracer (telemetry)                                       |
|         +- start_session() -> full session span                         |
|         +- trace_llm_call() -> span per LLM call                       |
|         +- trace_tool() -> span per tool execution                     |
|         +- NoopTracer if OTel not installed                             |
|                                                                         |
|    21. PresetManager (architect init)                                    |
|         +- apply(preset) -> generates .architect.md + config.yaml       |
|         +- 5 presets: python, node-react, ci, paranoid, yolo            |
+-------------------------------------------------------------------------+

Module Diagram and Dependencies

cli.py
 +-- config/loader.py ---- config/schema.py
 +-- logging/levels.py                          custom HUMAN level (25)
 +-- logging/human.py ---- logging/levels.py    HumanLogHandler + HumanLog
 +-- logging/setup.py ---- logging/levels.py
 |                          logging/human.py (HumanLogHandler)
 +-- tools/setup.py ------ tools/registry.py
 |                          tools/filesystem.py -- tools/base.py
 |                          tools/patch.py         tools/schemas.py
 |                          tools/search.py
 |                          execution/validators.py
 +-- mcp/discovery.py ---- mcp/client.py
 |                          mcp/adapter.py -------- tools/base.py
 +-- indexer/tree.py
 +-- indexer/cache.py
 +-- llm/adapter.py
 +-- core/hooks.py -------- config/schema.py (HookConfig)
 +-- core/context.py ----- indexer/tree.py (RepoIndex)
 |                          llm/adapter.py (LLMAdapter - for maybe_compress)
 +-- core/loop.py -------- core/state.py (AgentState, StopReason)
 |                          core/shutdown.py
 |                          core/timeout.py
 |                          core/context.py (ContextManager)
 |                          core/hooks.py (PostEditHooks - via ExecutionEngine)
 |                          costs/tracker.py (CostTracker, BudgetExceededError)
 |                          logging/human.py (HumanLog)
 +-- core/mixed_mode.py -- core/loop.py
 |                          core/context.py (ContextManager)
 |                          costs/tracker.py (CostTracker)
 +-- core/evaluator.py --- llm/adapter.py (LLMAdapter)
 |                          core/state.py (AgentState) - TYPE_CHECKING only
 +-- features/sessions.py -- core/state.py (StopReason)
 |                            config/schema.py (SessionsConfig)
 +-- features/report.py ---- core/state.py (AgentState)
 |                            costs/tracker.py (CostTracker)
 +-- features/dryrun.py ---- (standalone, minimal deps)
 +-- features/ralph.py ----- core/state.py (AgentState)       # v4-C1
 |                            costs/tracker.py (CostTracker)
 +-- features/pipelines.py -- core/state.py (AgentState)      # v4-C3
 |                             costs/tracker.py (CostTracker)
 +-- features/parallel.py -- (subprocess, standalone)
 +-- features/checkpoints.py - (subprocess git, standalone)
 +-- features/competitive.py -- features/parallel.py (ParallelRunner)
 +-- agents/reviewer.py ---- core/state.py (AgentState)
 +-- tools/dispatch.py ------ tools/base.py (BaseTool)
 |                             core/loop.py (AgentLoop - via factory)
 +-- core/health.py ---------- (AST stdlib + optional radon)
 +-- telemetry/otel.py ------- (optional opentelemetry)
 +-- config/presets.py -------- (standalone, templates)
 +-- agents/registry.py ---- agents/prompts.py
                            config/schema.py (AgentConfig)

Full Execution Flow

Single-agent mode — the default mode (architect run PROMPT)

GracefulShutdown()
     |
load_config(yaml, env, cli_flags)
     |
configure_logging()              logging/setup.py
  +- HumanLogHandler (stderr)    HUMAN events only (25)
  +- Technical console (stderr)  controlled by -v / -vv
  +- JSON file (optional)        captures everything (DEBUG+)
     |
ToolRegistry
  +- register_all_tools()    read_file, write_file, delete_file, list_files,
  |                          edit_file, apply_patch, search_code, grep, find_files
  +- MCPDiscovery()          mcp_{server}_{tool} (if MCP servers configured)
     |
RepoIndexer.build_index()    traverses workspace -> RepoIndex
  (or IndexCache.get())       uses cache if < 5 min
     |
LLMAdapter(config.llm)
     |
ContextManager(config.context)
     |
ContextBuilder(repo_index=index, context_manager=ctx_mgr)
     |
PostEditHooks(config.hooks.post_edit, workspace_root)
     |
get_agent("build", yaml_agents, cli_overrides)
  -> AgentConfig{system_prompt, allowed_tools, confirm_mode, max_steps=50}
     |
ExecutionEngine(registry, config, confirm_mode, hooks=post_edit_hooks)
     |
AgentLoop(llm, engine, agent_config, ctx, shutdown, step_timeout,
          context_manager, cost_tracker, timeout)
     |
AgentLoop.run(prompt, stream=True, on_stream_chunk=stderr_write)
     |
     -- while True: --------------------------------------------------------
     |
     |  [1] _check_safety_nets(state, step)
     |        +- USER_INTERRUPT?  -> return immediately (no LLM)
     |        +- MAX_STEPS?       -> _graceful_close() -> asks LLM for summary
     |        +- TIMEOUT?         -> _graceful_close() -> asks LLM for summary
     |        +- BUDGET_EXCEEDED? -> _graceful_close() -> asks LLM for summary
     |        +- CONTEXT_FULL?    -> _graceful_close() -> asks LLM for summary
     |
     |  [2] ContextManager.manage(messages, llm)
     |        +- compresses if > 75% of context window used
     |
     |  [3] hlog.llm_call(step, messages_count)
     |      with StepTimeout(step_timeout):
     |        llm.completion_stream(messages, tools_schema)
     |          -> StreamChunk("def foo...") --> stderr via callback
     |          -> LLMResponse(tool_calls=[ToolCall("edit_file", {...})])
     |
     |  [4] cost_tracker.record(step, model, usage, source="agent")
     |        +- if BudgetExceededError -> _graceful_close(BUDGET_EXCEEDED)
     |
     |  [5] If no tool_calls:
     |        hlog.agent_done(step)
     |        state.status = "success"
     |        state.stop_reason = StopReason.LLM_DONE
     |        break
     |
     |  [6] _execute_tool_calls_batch([tc1, tc2, ...])
     |        if parallel -> ThreadPoolExecutor(max_workers=4)
     |          -> hlog.tool_call("edit_file", {path:...})
     |          -> engine.execute_tool_call("edit_file", {path:..., old_str:..., new_str:...})
     |              1. registry.get("edit_file")
     |              2. tool.validate_args(args)         -> EditFileArgs
     |              3. policy.should_confirm()           -> True: prompt y/n/a
     |              4. if dry_run: return [DRY-RUN]
     |              5. EditFileTool.execute()
     |                   +- validate_path() - workspace confinement
     |                   +- assert old_str is unique
     |                   +- file.write_text(new_content)
     |                   +- return ToolResult(success=True, output="[diff...]")
     |          -> engine.run_post_edit_hooks(tool_name, args)
     |              +- PostEditHooks.run_for_tool() -> hook output appended to result
     |          -> hlog.tool_result("edit_file", success=True)
     |
     |  [7] ctx.append_tool_results(messages, tool_calls, results)
     |        +- ContextManager.truncate_tool_result(content)  <- Level 1
     |      state.steps.append(StepResult(...))
     |
     -- (back to [1]) ------------------------------------------------------
     |
hlog.loop_complete(status, stop_reason, total_steps, total_tool_calls)
state.status = "success" | "partial"  (depending on StopReason)

[Optional] SelfEvaluator (if --self-eval != "off")
     |
     +-- basic: evaluate_basic(prompt, state) -> EvalResult
     |     -> if not passed: state.status = "partial"
     |
     +-- full: evaluate_full(prompt, state, run_fn)
           -> loop up to max_retries: evaluate_basic() + run_fn(correction_prompt)
           -> returns the best AgentState

if --json: stdout <- json.dumps(state.to_output_dict())
if normal: stdout <- state.final_output

[v4-B1] SessionManager.save(session_state)   <- save final session
[v4-B2] if --report: ReportGenerator(report).to_{format}()
        if --report-file: write to file; otherwise, stdout

sys.exit(EXIT_CODE)  <- StopReason -> exit code mapping (0/1/2/3/4/5/130)

Mixed mode (legacy, no longer the default)

[configuration same as single-agent]

MixedModeRunner(llm, engine, plan_config, build_config, ctx,
                shutdown, step_timeout, context_manager, cost_tracker)
     |
     Note: a single shared engine (plan and build). The cost_tracker and
     ContextManager are also shared between phases.
     |
MixedModeRunner.run(prompt, stream=True, on_stream_chunk=...)
     |
     +-- PHASE 1: plan (no streaming)
     |     plan_loop = AgentLoop(llm, engine, plan_config, ctx,
     |                           context_manager=ctx_mgr,
     |                           cost_tracker=cost_tracker)
     |     plan_state = plan_loop.run(prompt, stream=False)
     |     if plan_state.status == "failed": return plan_state
     |     if shutdown.should_stop: return plan_state
     |
     +-- PHASE 2: build (with streaming)
     |     enriched_prompt = f"""
     |       The user asked: {prompt}
     |       The planning agent generated this plan:
     |       ---
     |       {plan_state.final_output}
     |       ---
     |       Your job is to execute this plan step by step...
     |     """
     |     build_loop = AgentLoop(llm, engine, build_config, ctx,
     |                            context_manager=ctx_mgr,
     |                            cost_tracker=cost_tracker)
     |     build_state = build_loop.run(enriched_prompt, stream=True, ...)
     |
     +-- return build_state

[SelfEvaluator is applied to build_state if --self-eval != "off"]

stdout / stderr Separation

This separation is critical for Unix pipe compatibility.

+-----------------------------+------------------------------------------+
| Destination                 | Content                                  |
+-----------------------------+------------------------------------------+
| stderr                      | Real-time LLM streaming chunks           |
| stderr                      | Structured logs (structlog)              |
| stderr                      | Execution header (model, workspace)      |
| stderr                      | MCP and indexer statistics                |
| stderr                      | Confirmation prompts                     |
| stderr                      | Shutdown notices (Ctrl+C)                |
| stderr                      | SelfEvaluator output                     |
| stderr                      | Human log: agent traceability            |
|                             | (Step 1 -> LLM, tool calls, results)    |
+-----------------------------+------------------------------------------+
| stdout                      | Agent's final response                   |
| stdout                      | JSON output (--json)                     |
+-----------------------------+------------------------------------------+

# Example of correct pipe usage:
architect run "analyze the project" -a resume --quiet --json | jq .status
architect run "generate README" --mode yolo > README.md
architect run "..." -v 2>logs.txt    # logs to file, result to stdout

Exit Codes

CodeConstantMeaning
0EXIT_SUCCESSSuccess — agent finished cleanly
1EXIT_FAILEDAgent failure — unrecoverable LLM or tool error
2EXIT_PARTIALPartial — did part of the work, didn’t complete (including SelfEvaluator failure)
3EXIT_CONFIG_ERRORConfiguration error or YAML file not found
4EXIT_AUTH_ERRORLLM authentication error (invalid API key)
5EXIT_TIMEOUTLLM call timeout
130EXIT_INTERRUPTEDInterrupted by Ctrl+C (POSIX: 128 + SIGINT=2)

Authentication errors (exit 4) and timeouts (exit 5) are detected by keywords in the LiteLLM error message, since LiteLLM can throw various exception types for the same conceptual error.

The SelfEvaluator can change a "success" to "partial" (exit 2) if it detects that the task was not completed correctly.


Design Decisions

DecisionJustification
Sync-first (no asyncio)Predictable, debuggable; LLM calls are the only latency
No LangChain/LangGraphThe loop is simple (~300 lines); adding abstraction would obscure the flow
Pydantic v2 as source of truthValidation, serialization, and documentation in one place
Tools never throw exceptionsThe agent loop stays stable against any tool failure
Clean stdoutUnix pipes: architect run ... | jq . works without filtering
MCP tools = BaseToolUnified registry; the agent doesn’t distinguish between local and remote
Selective retriesOnly transient errors (rate limit, connection); auth errors fail fast
SIGALRM for timeoutsPer-step, not global; allows resuming on the next step if there’s a timeout
run_fn in SelfEvaluatorAvoids circular coupling with AgentLoop; simplifies the evaluator API
Parallel tools with {future:idx}Guarantees correct result order regardless of completion order
ContextManager levels 1->2->3Progressive: level 1 always active; levels 2 and 3 are more aggressive defenses
RepoIndexer with os.walk()Efficient; prunes directories in-place (doesn’t visit them)
while True + safety netsThe LLM decides when to stop; the watchdogs are safety, not drivers
HUMAN log level (25)Agent traceability separated from technical noise
HumanFormatter with iconsVisual format allows understanding at a glance what the agent is doing
PostEditHooksPost-edit auto-verification without breaking the loop; results go back to the LLM
Graceful closeWatchdogs ask the LLM for a summary instead of cutting (except USER_INTERRUPT)