Pipeline

intake processes requirements through a 5-phase pipeline. Each phase transforms the data and passes it to the next one.

Sources        Phase 1       Phase 2       Phase 3        Phase 4         Phase 5
(files) -----> INGEST -----> ANALYZE ----> GENERATE ----> VERIFY -------> EXPORT
               (parsers)     (LLM)        (templates)    (checks)        (output)
                  |             |              |              |               |
            ParsedContent  AnalysisResult  Spec files   VerifyReport   Agent output

Phase 1: Ingest

Module: ingest/ Requires LLM: No (except ImageParser)

What it does

Takes requirements files in any format and converts them into a normalized structure (ParsedContent).

Flow

File --> Registry --> Detect format --> Select parser --> ParsedContent

The Registry receives the file path
Auto-detects the format by extension and content:
- Direct extension: .md -> markdown, .pdf -> pdf, .docx -> docx
- JSON subtypes: if it has key "issues" -> jira, otherwise -> yaml
- HTML subtypes: if it contains “confluence” or “atlassian” -> confluence
- Fallback: plaintext
Selects the parser registered for that format
The parser produces a normalized ParsedContent

ParsedContent

Each parsed source produces:

Field	Type	Description
`text`	string	Clean extracted text
`format`	string	Format identifier (e.g.: `"jira"`, `"markdown"`)
`source`	string	Path to the original file
`metadata`	dict	Key-value pairs (author, date, priority, etc.)
`sections`	list[dict]	Structured sections (title, level, content)
`relations`	list[dict]	Relationships between items (blocks, depends on, relates to)

Validations

Before parsing, each file goes through centralized validations:

The file must exist and be a regular file (not a directory)
Maximum size: 50 MB (MAX_FILE_SIZE_BYTES)
If the file is empty or only has whitespace: EmptySourceError error
Encoding: tries UTF-8 first, fallback to latin-1

See Input Formats for details on each parser.

Phase 2: Analyze

Module: analyze/ Requires LLM: Yes (async via litellm.acompletion)

What it does

Takes the ParsedContent from all sources and uses the LLM to extract structured requirements, detect conflicts, assess risks, and produce a technical design.

Sub-phases

ParsedContent[] --> Combine --> Extraction --> Dedup --> Validate --> Risk --> Design --> AnalysisResult

1. Combine sources

Concatenates the text from all sources with separators:

=== SOURCE 1: path/to/file.md (format: markdown) ===
[content]

---

=== SOURCE 2: path/to/jira.json (format: jira) ===
[content]

2. Extraction (LLM call)

Sends the combined text to the LLM with EXTRACTION_PROMPT. The LLM returns JSON with:

Functional requirements (FR-01, FR-02, …)
Non-functional requirements (NFR-01, NFR-02, …)
Conflicts between sources (CONFLICT-01, …)
Open questions (Q-01, Q-02, …)

The prompt is configured with: number of sources, language, requirements format (ears, user-stories, etc.).

3. Deduplication

Compares requirement titles using Jaccard similarity (word intersection / word union):

Threshold: 0.75 (75% of words in common = duplicate)
Normalizes: lowercase, strip, collapse whitespace
Deduplicates functional and non-functional requirements separately
Keeps the first occurrence

4. Validation

Conflicts: those without a description, sources, or recommendation are filtered out
Open questions: those without question text or context are filtered out

5. Risk assessment (optional)

If config.spec.risk_assessment = true, it makes another LLM call with RISK_ASSESSMENT_PROMPT. It produces a list of risks (RISK-01, …) with:

Associated requirement IDs
Probability and impact (low/medium/high)
Category (technical, scope, integration, security, performance)
Suggested mitigation

6. Design (LLM call)

Third LLM call with DESIGN_PROMPT. It produces:

Architecture components
Files to create and modify (path + description + action)
Technical decisions (decision, rationale, associated requirement)
Tasks with dependencies (DAG), estimation in minutes, files, checks
Acceptance checks (command, files_exist, pattern_present, pattern_absent)
External project dependencies

AnalysisResult

The complete result contains:

Field	Type	Description
`functional_requirements`	list[Requirement]	Functional requirements (FR-XX)
`non_functional_requirements`	list[Requirement]	Non-functional requirements (NFR-XX)
`conflicts`	list[Conflict]	Conflicts between sources
`open_questions`	list[OpenQuestion]	Unanswered questions
`risks`	list[RiskItem]	Risk assessment
`design`	DesignResult	Technical design with tasks and checks
`duplicates_removed`	int	Number of duplicates removed
`total_cost`	float	Total analysis cost in USD
`model_used`	string	LLM model used

Cost control

The LLMAdapter tracks the cost of each call:

Accumulates total_cost, total_input_tokens, total_output_tokens
After each call, compares against max_cost_per_spec
If the budget is exceeded, it throws CostLimitError and the analysis stops
Cost is calculated via litellm.completion_cost()

Phase 3: Generate

Module: generate/ Requires LLM: No

What it does

Takes the AnalysisResult and renders 6 Markdown/YAML files using Jinja2 templates, plus a spec.lock.yaml for reproducibility.

Templates

Generated file	Template	Main content
`requirements.md`	`requirements.md.j2`	FR, NFR, conflicts, open questions
`design.md`	`design.md.j2`	Components, files, decisions, dependencies
`tasks.md`	`tasks.md.j2`	Summary table + detail per task
`acceptance.yaml`	`acceptance.yaml.j2`	Executable checks by type
`context.md`	`context.md.j2`	Project info, stack, risks
`sources.md`	`sources.md.j2`	Sources, traceability, conflicts

spec.lock.yaml

Reproducibility file with:

Field	Description
`version`	Lock format version (currently “1”)
`created_at`	ISO creation timestamp
`model`	LLM model used
`config_hash`	Hash of the configuration used
`source_hashes`	Map of file -> SHA-256 (first 16 hex chars)
`spec_hashes`	Map of spec file -> SHA-256
`total_cost`	Total analysis cost in USD
`requirement_count`	Number of requirements
`task_count`	Number of tasks

It is used to detect if sources have changed since the last generation (is_stale()).

Phase 4: Verify

Module: verify/ Requires LLM: No

What it does

Runs the checks defined in acceptance.yaml against the project directory. Produces a report with results.

Check types

Type	What it verifies	Fields used
`command`	Runs a shell command and verifies exit code == 0	`command`
`files_exist`	Verifies that all listed paths exist	`paths`
`pattern_present`	Verifies that regex patterns exist in files matching the glob	`glob`, `patterns`
`pattern_absent`	Verifies that regex patterns do NOT exist in files matching the glob	`glob`, `patterns`

Report formats

Format	Class	Usage
`terminal`	`TerminalReporter`	Rich table with colors in the terminal
`json`	`JsonReporter`	Machine-readable JSON
`junit`	`JunitReporter`	JUnit XML for CI (GitHub Actions, Jenkins)

See Verification for full details.

Phase 5: Export

Module: export/ Requires LLM: No

What it does

Takes the generated spec files and transforms them into a format ready for a specific AI agent.

Available formats

Format	What it generates	Best for
`architect`	`pipeline.yaml` + spec copy	Architect-based agents
`generic`	`SPEC.md` + `verify.sh` + spec copy	Any agent / manual use

See Export for full details.

Complete data flow

.md / .json / .pdf / .docx / .html / .yaml / .txt / .png
                           |
                      [ INGEST ]
                           |
                   list[ParsedContent]
                           |
                      [ ANALYZE ]
                      (3 LLM calls)
                           |
                     AnalysisResult
                           |
                      [ GENERATE ]
                      (6 templates)
                           |
              specs/my-feature/
              ├── requirements.md
              ├── design.md
              ├── tasks.md
              ├── acceptance.yaml
              ├── context.md
              ├── sources.md
              └── spec.lock.yaml
                           |
                      [ VERIFY ]         [ EXPORT ]
                           |                  |
                  VerificationReport     output/
                  (pass/fail/skip)       ├── pipeline.yaml  (architect)
                                         ├── SPEC.md        (generic)
                                         ├── verify.sh      (generic)
                                         └── spec/          (copy)