Pipeline
intake processes requirements through a 5-phase pipeline. Each phase transforms the data and passes it to the next one.
Sources Phase 1 Phase 2 Phase 3 Phase 4 Phase 5
(files) -----> INGEST -----> ANALYZE ----> GENERATE ----> VERIFY -------> EXPORT
(parsers) (LLM) (templates) (checks) (output)
| | | | |
ParsedContent AnalysisResult Spec files VerifyReport Agent output
Phase 1: Ingest
Module: ingest/
Requires LLM: No (except ImageParser)
What it does
Takes requirements files in any format and converts them into a normalized structure (ParsedContent).
Flow
File --> Registry --> Detect format --> Select parser --> ParsedContent
- The Registry receives the file path
- Auto-detects the format by extension and content:
- Direct extension:
.md-> markdown,.pdf-> pdf,.docx-> docx - JSON subtypes: if it has key
"issues"-> jira, otherwise -> yaml - HTML subtypes: if it contains “confluence” or “atlassian” -> confluence
- Fallback: plaintext
- Direct extension:
- Selects the parser registered for that format
- The parser produces a normalized
ParsedContent
ParsedContent
Each parsed source produces:
| Field | Type | Description |
|---|---|---|
text | string | Clean extracted text |
format | string | Format identifier (e.g.: "jira", "markdown") |
source | string | Path to the original file |
metadata | dict | Key-value pairs (author, date, priority, etc.) |
sections | list[dict] | Structured sections (title, level, content) |
relations | list[dict] | Relationships between items (blocks, depends on, relates to) |
Validations
Before parsing, each file goes through centralized validations:
- The file must exist and be a regular file (not a directory)
- Maximum size: 50 MB (
MAX_FILE_SIZE_BYTES) - If the file is empty or only has whitespace:
EmptySourceErrorerror - Encoding: tries UTF-8 first, fallback to latin-1
See Input Formats for details on each parser.
Phase 2: Analyze
Module: analyze/
Requires LLM: Yes (async via litellm.acompletion)
What it does
Takes the ParsedContent from all sources and uses the LLM to extract structured requirements, detect conflicts, assess risks, and produce a technical design.
Sub-phases
ParsedContent[] --> Combine --> Extraction --> Dedup --> Validate --> Risk --> Design --> AnalysisResult
1. Combine sources
Concatenates the text from all sources with separators:
=== SOURCE 1: path/to/file.md (format: markdown) ===
[content]
---
=== SOURCE 2: path/to/jira.json (format: jira) ===
[content]
2. Extraction (LLM call)
Sends the combined text to the LLM with EXTRACTION_PROMPT. The LLM returns JSON with:
- Functional requirements (FR-01, FR-02, …)
- Non-functional requirements (NFR-01, NFR-02, …)
- Conflicts between sources (CONFLICT-01, …)
- Open questions (Q-01, Q-02, …)
The prompt is configured with: number of sources, language, requirements format (ears, user-stories, etc.).
3. Deduplication
Compares requirement titles using Jaccard similarity (word intersection / word union):
- Threshold: 0.75 (75% of words in common = duplicate)
- Normalizes: lowercase, strip, collapse whitespace
- Deduplicates functional and non-functional requirements separately
- Keeps the first occurrence
4. Validation
- Conflicts: those without a description, sources, or recommendation are filtered out
- Open questions: those without question text or context are filtered out
5. Risk assessment (optional)
If config.spec.risk_assessment = true, it makes another LLM call with RISK_ASSESSMENT_PROMPT. It produces a list of risks (RISK-01, …) with:
- Associated requirement IDs
- Probability and impact (low/medium/high)
- Category (technical, scope, integration, security, performance)
- Suggested mitigation
6. Design (LLM call)
Third LLM call with DESIGN_PROMPT. It produces:
- Architecture components
- Files to create and modify (path + description + action)
- Technical decisions (decision, rationale, associated requirement)
- Tasks with dependencies (DAG), estimation in minutes, files, checks
- Acceptance checks (command, files_exist, pattern_present, pattern_absent)
- External project dependencies
AnalysisResult
The complete result contains:
| Field | Type | Description |
|---|---|---|
functional_requirements | list[Requirement] | Functional requirements (FR-XX) |
non_functional_requirements | list[Requirement] | Non-functional requirements (NFR-XX) |
conflicts | list[Conflict] | Conflicts between sources |
open_questions | list[OpenQuestion] | Unanswered questions |
risks | list[RiskItem] | Risk assessment |
design | DesignResult | Technical design with tasks and checks |
duplicates_removed | int | Number of duplicates removed |
total_cost | float | Total analysis cost in USD |
model_used | string | LLM model used |
Cost control
The LLMAdapter tracks the cost of each call:
- Accumulates
total_cost,total_input_tokens,total_output_tokens - After each call, compares against
max_cost_per_spec - If the budget is exceeded, it throws
CostLimitErrorand the analysis stops - Cost is calculated via
litellm.completion_cost()
Phase 3: Generate
Module: generate/
Requires LLM: No
What it does
Takes the AnalysisResult and renders 6 Markdown/YAML files using Jinja2 templates, plus a spec.lock.yaml for reproducibility.
Templates
| Generated file | Template | Main content |
|---|---|---|
requirements.md | requirements.md.j2 | FR, NFR, conflicts, open questions |
design.md | design.md.j2 | Components, files, decisions, dependencies |
tasks.md | tasks.md.j2 | Summary table + detail per task |
acceptance.yaml | acceptance.yaml.j2 | Executable checks by type |
context.md | context.md.j2 | Project info, stack, risks |
sources.md | sources.md.j2 | Sources, traceability, conflicts |
spec.lock.yaml
Reproducibility file with:
| Field | Description |
|---|---|
version | Lock format version (currently “1”) |
created_at | ISO creation timestamp |
model | LLM model used |
config_hash | Hash of the configuration used |
source_hashes | Map of file -> SHA-256 (first 16 hex chars) |
spec_hashes | Map of spec file -> SHA-256 |
total_cost | Total analysis cost in USD |
requirement_count | Number of requirements |
task_count | Number of tasks |
It is used to detect if sources have changed since the last generation (is_stale()).
Phase 4: Verify
Module: verify/
Requires LLM: No
What it does
Runs the checks defined in acceptance.yaml against the project directory. Produces a report with results.
Check types
| Type | What it verifies | Fields used |
|---|---|---|
command | Runs a shell command and verifies exit code == 0 | command |
files_exist | Verifies that all listed paths exist | paths |
pattern_present | Verifies that regex patterns exist in files matching the glob | glob, patterns |
pattern_absent | Verifies that regex patterns do NOT exist in files matching the glob | glob, patterns |
Report formats
| Format | Class | Usage |
|---|---|---|
terminal | TerminalReporter | Rich table with colors in the terminal |
json | JsonReporter | Machine-readable JSON |
junit | JunitReporter | JUnit XML for CI (GitHub Actions, Jenkins) |
See Verification for full details.
Phase 5: Export
Module: export/
Requires LLM: No
What it does
Takes the generated spec files and transforms them into a format ready for a specific AI agent.
Available formats
| Format | What it generates | Best for |
|---|---|---|
architect | pipeline.yaml + spec copy | Architect-based agents |
generic | SPEC.md + verify.sh + spec copy | Any agent / manual use |
See Export for full details.
Complete data flow
.md / .json / .pdf / .docx / .html / .yaml / .txt / .png
|
[ INGEST ]
|
list[ParsedContent]
|
[ ANALYZE ]
(3 LLM calls)
|
AnalysisResult
|
[ GENERATE ]
(6 templates)
|
specs/my-feature/
├── requirements.md
├── design.md
├── tasks.md
├── acceptance.yaml
├── context.md
├── sources.md
└── spec.lock.yaml
|
[ VERIFY ] [ EXPORT ]
| |
VerificationReport output/
(pass/fail/skip) ├── pipeline.yaml (architect)
├── SPEC.md (generic)
├── verify.sh (generic)
└── spec/ (copy)