Pipeline

intake processes requirements through a 5-phase pipeline. Each phase transforms the data and passes it to the next one.

Sources             Phase 1       Phase 2       Phase 3        Phase 4         Phase 5
(files/URLs) -----> INGEST -----> ANALYZE ----> GENERATE ----> VERIFY -------> EXPORT
                    (parsers)     (LLM)        (templates)    (checks)        (output)
                       |             |              |              |               |
                 ParsedContent  AnalysisResult  Spec files   VerifyReport   Agent output
                                     |              |
                              Complexity      Adaptive
                             Assessment      Generation

Phase 1: Ingest

Module: ingest/ (11 parsers) Requires LLM: No (except ImageParser)

What it does

Takes requirements files in any format and converts them into a normalized structure (ParsedContent). Supports local files, URLs, and stdin.

Flow

Source --> parse_source() --> Registry --> Detect format --> Select parser --> ParsedContent
  1. Source resolution: parse_source() determines the source type:
    • Local files -> passed to the registry
    • HTTP/HTTPS URLs -> processed with UrlParser
    • Scheme URIs (jira://, confluence://, github://) -> reserved for future connectors
    • Stdin (-) -> read as plaintext
    • Free text -> treated as plaintext
  2. The Registry receives the file path
  3. Auto-detects the format by extension and content:
    • Direct extension: .md -> markdown, .pdf -> pdf, .docx -> docx
    • JSON subtypes: Jira > GitHub Issues > Slack > generic YAML
    • HTML subtypes: if it contains “confluence” or “atlassian” -> confluence
    • Fallback: plaintext
  4. Selects the parser registered for that format (via plugin discovery or manual registration)
  5. The parser produces a normalized ParsedContent

ParsedContent

Each parsed source produces:

FieldTypeDescription
textstringClean extracted text
formatstringFormat identifier (e.g.: "jira", "markdown")
sourcestringPath to the original file
metadatadictKey-value pairs (author, date, priority, etc.)
sectionslist[dict]Structured sections (title, level, content)
relationslist[dict]Relationships between items (blocks, depends on, relates to)

Validations

Before parsing, each file goes through centralized validations:

  • The file must exist and be a regular file (not a directory)
  • Maximum size: 50 MB (MAX_FILE_SIZE_BYTES)
  • If the file is empty or only has whitespace: EmptySourceError error
  • Encoding: tries UTF-8 first, fallback to latin-1

See Input Formats for details on each parser.


Phase 2: Analyze

Module: analyze/ Requires LLM: Yes (async via litellm.acompletion)

What it does

Takes the ParsedContent from all sources and uses the LLM to extract structured requirements, detect conflicts, assess risks, and produce a technical design.

Sub-phases

ParsedContent[] --> Combine --> Extraction --> Dedup --> Validate --> Risk --> Design --> AnalysisResult

1. Combine sources

Concatenates the text from all sources with separators:

=== SOURCE 1: path/to/file.md (format: markdown) ===
[content]

---

=== SOURCE 2: path/to/jira.json (format: jira) ===
[content]

2. Extraction (LLM call)

Sends the combined text to the LLM with EXTRACTION_PROMPT. The LLM returns JSON with:

  • Functional requirements (FR-01, FR-02, …)
  • Non-functional requirements (NFR-01, NFR-02, …)
  • Conflicts between sources (CONFLICT-01, …)
  • Open questions (Q-01, Q-02, …)

The prompt is configured with: number of sources, language, requirements format (ears, user-stories, etc.).

3. Deduplication

Compares requirement titles using Jaccard similarity (word intersection / word union):

  • Threshold: 0.75 (75% of words in common = duplicate)
  • Normalizes: lowercase, strip, collapse whitespace
  • Deduplicates functional and non-functional requirements separately
  • Keeps the first occurrence

4. Validation

  • Conflicts: those without a description, sources, or recommendation are filtered out
  • Open questions: those without question text or context are filtered out

5. Risk assessment (optional)

If config.spec.risk_assessment = true, it makes another LLM call with RISK_ASSESSMENT_PROMPT. It produces a list of risks (RISK-01, …) with:

  • Associated requirement IDs
  • Probability and impact (low/medium/high)
  • Category (technical, scope, integration, security, performance)
  • Suggested mitigation

6. Design (LLM call)

Third LLM call with DESIGN_PROMPT. It produces:

  • Architecture components
  • Files to create and modify (path + description + action)
  • Technical decisions (decision, rationale, associated requirement)
  • Tasks with dependencies (DAG), estimation in minutes, files, checks
  • Acceptance checks (command, files_exist, pattern_present, pattern_absent)
  • External project dependencies

AnalysisResult

The complete result contains:

FieldTypeDescription
functional_requirementslist[Requirement]Functional requirements (FR-XX)
non_functional_requirementslist[Requirement]Non-functional requirements (NFR-XX)
conflictslist[Conflict]Conflicts between sources
open_questionslist[OpenQuestion]Unanswered questions
riskslist[RiskItem]Risk assessment
designDesignResultTechnical design with tasks and checks
duplicates_removedintNumber of duplicates removed
total_costfloatTotal analysis cost in USD
model_usedstringLLM model used

Cost control

The LLMAdapter tracks the cost of each call:

  • Accumulates total_cost, total_input_tokens, total_output_tokens
  • After each call, compares against max_cost_per_spec
  • If the budget is exceeded, it throws CostLimitError and the analysis stops
  • Cost is calculated via litellm.completion_cost()

Phase 2.5: Complexity classification

Module: analyze/complexity.py Requires LLM: No

What it does

Before generating, the complexity of sources is classified to select the optimal generation mode. This classification is heuristic-based (does not use LLM).

Criteria

ModeConditionsConfidence
quick<500 words AND 1 source AND no structured contentHigh
enterprise4+ sources OR >5000 wordsHigh
standardEverything that is not quick or enterpriseMedium

Structured content includes formats such as jira, confluence, yaml, github_issues, slack.

The classification can be overridden with --mode in the CLI.


Phase 3: Generate

Module: generate/ Requires LLM: No

What it does

Takes the AnalysisResult and renders Markdown/YAML files using Jinja2 templates, plus a spec.lock.yaml for reproducibility. The number of generated files depends on the mode.

Adaptive generation

The AdaptiveSpecBuilder wraps the standard SpecBuilder and filters files according to the mode:

ModeGenerated files
quickcontext.md, tasks.md
standardAll 6 complete files
enterpriseAll 6 files + detailed risks

Templates

Generated fileTemplateMain content
requirements.mdrequirements.md.j2FR, NFR, conflicts, open questions
design.mddesign.md.j2Components, files, decisions, dependencies
tasks.mdtasks.md.j2Summary table + detail per task
acceptance.yamlacceptance.yaml.j2Executable checks by type
context.mdcontext.md.j2Project info, stack, risks
sources.mdsources.md.j2Sources, traceability, conflicts

spec.lock.yaml

Reproducibility file with:

FieldDescription
versionLock format version (currently “1”)
created_atISO creation timestamp
modelLLM model used
config_hashHash of the configuration used
source_hashesMap of file -> SHA-256 (first 16 hex chars)
spec_hashesMap of spec file -> SHA-256
total_costTotal analysis cost in USD
requirement_countNumber of requirements
task_countNumber of tasks

It is used to detect if sources have changed since the last generation (is_stale()).


Phase 4: Verify

Module: verify/ Requires LLM: No

What it does

Runs the checks defined in acceptance.yaml against the project directory. Produces a report with results.

Check types

TypeWhat it verifiesFields used
commandRuns a shell command and verifies exit code == 0command
files_existVerifies that all listed paths existpaths
pattern_presentVerifies that regex patterns exist in files matching the globglob, patterns
pattern_absentVerifies that regex patterns do NOT exist in files matching the globglob, patterns

Report formats

FormatClassUsage
terminalTerminalReporterRich table with colors in the terminal
jsonJsonReporterMachine-readable JSON
junitJunitReporterJUnit XML for CI (GitHub Actions, Jenkins)

See Verification for full details.


Phase 5: Export

Module: export/ Requires LLM: No

What it does

Takes the generated spec files and transforms them into a format ready for a specific AI agent.

Available formats

FormatWhat it generatesBest for
architectpipeline.yaml + spec copyArchitect-based agents
genericSPEC.md + verify.sh + spec copyAny agent / manual use

See Export for full details.


Complete data flow

.md / .json / .pdf / .docx / .html / .yaml / .txt / .png / URLs
                           |
                   [ SOURCE RESOLUTION ]
                   (parse_source -> file, url, stdin, text)
                           |
                      [ INGEST ]
                      (11 parsers via plugin discovery)
                           |
                   list[ParsedContent]
                           |
                  [ COMPLEXITY CLASSIFICATION ]
                  (quick / standard / enterprise)
                           |
                      [ ANALYZE ]
                      (3 LLM calls)
                           |
                     AnalysisResult
                           |
                  [ ADAPTIVE GENERATE ]
                  (2-6 templates depending on mode)
                           |
              specs/my-feature/
              +-- requirements.md    (standard, enterprise)
              +-- design.md          (standard, enterprise)
              +-- tasks.md           (always)
              +-- acceptance.yaml    (standard, enterprise)
              +-- context.md         (always)
              +-- sources.md         (standard, enterprise)
              +-- spec.lock.yaml
                           |
                      [ VERIFY ]         [ EXPORT ]
                           |                  |
                  VerificationReport     output/
                  (pass/fail/skip)       +-- pipeline.yaml  (architect)
                                         +-- SPEC.md        (generic)
                                         +-- verify.sh      (generic)
                                         +-- spec/          (copy)