Pipeline

intake processes requirements through a 5-phase pipeline. Each phase transforms the data and passes it to the next one.

Sources             Phase 1       Phase 2       Phase 3        Phase 4         Phase 5
(files/URLs) ----> INGEST -----> ANALYZE ----> GENERATE ----> VERIFY -------> EXPORT
                   (parsers)     (LLM)        (templates)    (checks)        (output)
                      |             |              |              |               |
                ParsedContent  AnalysisResult  Spec files   VerifyReport   Agent output
                                    |              |
                             Complexity      Adaptive
                            Assessment      Generation

Phase 1: Ingest

Module: ingest/ (11 parsers) Requires LLM: No (except ImageParser)

What It Does

Takes requirement files in any format and converts them into a normalized structure (ParsedContent). Supports local files, URLs, and stdin.

Flow

Source --> parse_source() --> Registry --> Detects format --> Selects parser --> ParsedContent
  1. Source resolution: parse_source() determines the source type:
    • Local files -> passed to the registry
    • HTTP/HTTPS URLs -> processed with UrlParser
    • Scheme URIs (jira://, confluence://, github://) -> resolved via API connectors (downloaded to temporary files)
    • Stdin (-) -> read as plaintext
    • Free text -> treated as plaintext
  2. The Registry receives the file path
  3. Auto-detects the format by extension and content:
    • Direct extension: .md -> markdown, .pdf -> pdf, .docx -> docx
    • JSON subtypes: Jira > GitHub Issues > Slack > generic YAML
    • HTML subtypes: if it contains “confluence” or “atlassian” -> confluence
    • Fallback: plaintext
  4. Selects the parser registered for that format (via plugin discovery or manual registration)
  5. The parser produces a normalized ParsedContent

ParsedContent

Each parsed source produces:

FieldTypeDescription
textstringClean extracted text
formatstringFormat identifier (e.g.: "jira", "markdown")
sourcestringPath to the original file
metadatadictKey-value pairs (author, date, priority, etc.)
sectionslist[dict]Structured sections (title, level, content)
relationslist[dict]Relationships between items (blocks, depends on, relates to)

Validations

Before parsing, each file goes through centralized validations:

  • The file must exist and be a regular file (not a directory)
  • Maximum size: 50 MB (MAX_FILE_SIZE_BYTES)
  • If the file is empty or only has whitespace: EmptySourceError error
  • Encoding: tries UTF-8 first, fallback to latin-1

See Input Formats for details on each parser.


Phase 2: Analyze

Module: analyze/ Requires LLM: Yes (async via litellm.acompletion)

What It Does

Takes the ParsedContent from all sources and uses the LLM to extract structured requirements, detect conflicts, assess risks and produce a technical design.

Sub-phases

ParsedContent[] --> Combine --> Extraction --> Dedup --> Validate --> Risk --> Design --> AnalysisResult

1. Combine Sources

Concatenates text from all sources with separators:

=== SOURCE 1: path/to/file.md (format: markdown) ===
[content]

---

=== SOURCE 2: path/to/jira.json (format: jira) ===
[content]

2. Extraction (LLM call)

Sends the combined text to the LLM with EXTRACTION_PROMPT. The LLM returns JSON with:

  • Functional requirements (FR-01, FR-02, …)
  • Non-functional requirements (NFR-01, NFR-02, …)
  • Conflicts between sources (CONFLICT-01, …)
  • Open questions (Q-01, Q-02, …)

The prompt is configured with: number of sources, language, requirements format (ears, user-stories, etc.).

3. Deduplication

Compares requirement titles using Jaccard similarity (word intersection / word union):

  • Threshold: 0.75 (75% of words in common = duplicate)
  • Normalizes: lowercase, strip, collapse whitespace
  • Deduplicates functional and non-functional separately
  • Keeps the first occurrence

4. Validation

  • Conflicts: those without description, sources, or recommendation are filtered out
  • Open questions: those without question text or context are filtered out

5. Risk Assessment (optional)

If config.spec.risk_assessment = true, makes another LLM call with RISK_ASSESSMENT_PROMPT. Produces a list of risks (RISK-01, …) with:

  • Associated requirement IDs
  • Probability and impact (low/medium/high)
  • Category (technical, scope, integration, security, performance)
  • Suggested mitigation

6. Design (LLM call)

Third LLM call with DESIGN_PROMPT. Produces:

  • Architecture components
  • Files to create and modify (path + description + action)
  • Technical decisions (decision, justification, associated requirement)
  • Tasks with dependencies (DAG), time estimate in minutes, files, checks
  • Acceptance checks (command, files_exist, pattern_present, pattern_absent)
  • External project dependencies

AnalysisResult

The complete result contains:

FieldTypeDescription
functional_requirementslist[Requirement]Functional requirements (FR-XX)
non_functional_requirementslist[Requirement]Non-functional requirements (NFR-XX)
conflictslist[Conflict]Conflicts between sources
open_questionslist[OpenQuestion]Unanswered questions
riskslist[RiskItem]Risk assessment
designDesignResultTechnical design with tasks and checks
duplicates_removedintNumber of duplicates removed
total_costfloatTotal analysis cost in USD
model_usedstringLLM model used

Cost Control

The LLMAdapter tracks the cost of each call:

  • Accumulates total_cost, total_input_tokens, total_output_tokens
  • After each call, compares against max_cost_per_spec
  • If the budget is exceeded, raises CostLimitError and the analysis stops
  • Cost is calculated via litellm.completion_cost()

Phase 2.5: Complexity Classification

Module: analyze/complexity.py Requires LLM: No

What It Does

Before generating, source complexity is classified to select the optimal generation mode. This classification is heuristic (does not use LLM).

Criteria

ModeConditionsConfidence
quick<500 words AND 1 source AND no structured contentHigh
enterprise4+ sources OR >5000 wordsHigh
standardEverything that is not quick or enterpriseMedium

Structured content includes formats such as jira, confluence, yaml, github_issues, slack.

The classification can be overridden with --mode in the CLI.


Phase 3: Generate

Module: generate/ Requires LLM: No

What It Does

Takes the AnalysisResult and renders Markdown/YAML files using Jinja2 templates, plus a spec.lock.yaml for reproducibility. The number of files generated depends on the mode.

Adaptive Generation

The AdaptiveSpecBuilder wraps the standard SpecBuilder and filters files based on mode:

ModeGenerated Files
quickcontext.md, tasks.md
standardAll 6 complete files
enterpriseAll 6 files + detailed risks

Templates

Generated FileTemplateMain Content
requirements.mdrequirements.md.j2FR, NFR, conflicts, open questions
design.mddesign.md.j2Components, files, decisions, dependencies
tasks.mdtasks.md.j2Summary table + detail per task
acceptance.yamlacceptance.yaml.j2Executable checks by type
context.mdcontext.md.j2Project info, stack, risks
sources.mdsources.md.j2Sources, traceability, conflicts

spec.lock.yaml

Reproducibility file with:

FieldDescription
versionLock format version (currently “1”)
created_atISO creation timestamp
modelLLM model used
config_hashHash of the configuration used
source_hashesMap of file -> SHA-256 (first 16 hex chars)
spec_hashesMap of spec file -> SHA-256
total_costTotal analysis cost in USD
requirement_countNumber of requirements
task_countNumber of tasks

Used to detect if sources have changed since the last generation (is_stale()).


Phase 4: Verify

Module: verify/ Requires LLM: No

What It Does

Runs the checks defined in acceptance.yaml against the project directory. Produces a report with results.

Check Types

TypeWhat it verifiesFields used
commandRuns a shell command and verifies exit code == 0command
files_existVerifies that all listed paths existpaths
pattern_presentVerifies that regex patterns exist in files matching the globglob, patterns
pattern_absentVerifies that regex patterns DO NOT exist in files matching the globglob, patterns

Report Formats

FormatClassUsage
terminalTerminalReporterRich table with colors in the terminal
jsonJsonReporterMachine-readable JSON
junitJunitReporterJUnit XML for CI (GitHub Actions, Jenkins)

See Verification for complete details.


Phase 5: Export

Module: export/ Requires LLM: No

What It Does

Takes the generated spec files and transforms them into a format ready for a specific AI agent.

Available Formats

FormatWhat it generatesBest for
architectpipeline.yaml + spec copyArchitect-based agents
genericSPEC.md + verify.sh + spec copyAny agent / manual use
claude-codeCLAUDE.md + .intake/tasks/ + verify.shClaude Code
cursor.cursor/rules/intake-spec.mdcCursor
kirorequirements.md + design.md + tasks.md (native format)Kiro
copilot.github/copilot-instructions.mdGitHub Copilot

See Export for complete details.


Complete Data Flow

.md / .json / .pdf / .docx / .html / .yaml / .txt / .png / URLs
                           |
                   [ SOURCE RESOLUTION ]
                   (parse_source -> file, url, stdin, text)
                           |
                      [ INGEST ]
                      (11 parsers via plugin discovery)
                           |
                   list[ParsedContent]
                           |
                  [ COMPLEXITY CLASSIFICATION ]
                  (quick / standard / enterprise)
                           |
                      [ ANALYZE ]
                      (3 LLM calls)
                           |
                     AnalysisResult
                           |
                  [ ADAPTIVE GENERATE ]
                  (2-6 templates based on mode)
                           |
              specs/my-feature/
              ├── requirements.md    (standard, enterprise)
              ├── design.md          (standard, enterprise)
              ├── tasks.md           (always)
              ├── acceptance.yaml    (standard, enterprise)
              ├── context.md         (always)
              ├── sources.md         (standard, enterprise)
              └── spec.lock.yaml
                           |
                      [ VERIFY ]         [ EXPORT ]
                           |                  |
                  VerificationReport     output/
                  (pass/fail/skip)       ├── pipeline.yaml  (architect)
                                         ├── SPEC.md        (generic)
                                         ├── verify.sh      (generic)
                                         ├── CLAUDE.md      (claude-code)
                                         ├── .cursor/rules/ (cursor)
                                         ├── .github/       (copilot)
                                         └── spec/          (copy)
                                                |
                                         [ FEEDBACK ]  (optional)
                                                |
                                        FeedbackResult
                                        (suggestions + amendments)

Feedback Loop (optional)

Module: feedback/ Requires LLM: Yes (async via litellm.acompletion)

What It Does

Closes the cycle between verification and implementation. When checks fail, it analyzes the causes and suggests fixes to both the implementation and the spec.

Flow

VerificationReport (failed checks)
         |
   [ ANALYZE FAILURES ]     (LLM call)
         |
   FeedbackResult
   ├── FailureAnalysis[]     (root cause + suggestion per failure)
   ├── SpecAmendment[]       (proposed amendments to the spec)
   ├── summary               (general summary)
   └── estimated_effort      (small / medium / large)
         |
   [ APPLY? ]               (if --apply or auto_amend_spec)
         |
   Updated spec

Components

ComponentWhat it does
FeedbackAnalyzerAnalyzes failures with LLM, produces FeedbackResult
SuggestionFormatterFormats suggestions for terminal or agent (generic, claude-code, cursor)
SpecUpdaterPreview and application of amendments to spec files

Data Model

DataclassMain Fields
FailureAnalysischeck_name, root_cause, suggestion, severity, affected_tasks, spec_amendment
SpecAmendmenttarget_file, section, action (add/modify/remove), content
FeedbackResultfailures, summary, estimated_effort, total_cost
AmendmentPreviewamendment, current_content, proposed_content, applicable, reason
ApplyResultapplied, skipped, details

See Feedback for complete documentation.