Architecture
This document describes the internal structure of vigil, the analysis engine flow, and the analyzer protocol.
Project structure
vigil-cli/
src/vigil/
__init__.py # __version__
cli.py # Click commands (scan, deps, tests, init, rules)
config/
__init__.py
schema.py # Pydantic v2 models for configuration
loader.py # Config loading and merging (YAML + CLI)
rules.py # Catalog of 26 rules (RULES_V0)
core/
__init__.py
finding.py # Severity, Category, Location, Finding
engine.py # ScanEngine, ScanResult
file_collector.py # File discovery
rule_registry.py # RuleRegistry for rule access
analyzers/
__init__.py
base.py # BaseAnalyzer Protocol
deps/ # CAT-01: Dependency Analyzer
__init__.py
analyzer.py # DependencyAnalyzer (DEP-001..007)
parsers.py # Parsers for requirements.txt, pyproject.toml, package.json
registry_client.py # HTTP client for PyPI/npm with local cache
similarity.py # Levenshtein + popular package corpus
auth/ # CAT-02: Auth Analyzer
__init__.py
analyzer.py # AuthAnalyzer (AUTH-001..007)
endpoint_detector.py # HTTP endpoint detection (FastAPI/Flask/Express)
middleware_checker.py # Auth middleware verification
patterns.py # Regex for JWT, CORS, cookies, passwords
secrets/ # CAT-03: Secrets Analyzer
__init__.py
analyzer.py # SecretsAnalyzer (SEC-001..006)
placeholder_detector.py # Placeholder and assignment detection
entropy.py # Shannon entropy calculation
env_tracer.py # Value tracing from .env.example
tests/ # CAT-06: Test Quality Analyzer
__init__.py
analyzer.py # TestQualityAnalyzer (TEST-001..006)
assert_checker.py # Test function extraction, assertion counting
mock_checker.py # Mock mirror detection
coverage_heuristics.py # Test file identification and framework detection
reports/
__init__.py
formatter.py # BaseFormatter Protocol + factory
human.py # Terminal format with colors
json_fmt.py # Structured JSON format
junit.py # JUnit XML format
sarif.py # SARIF 2.1.0 format
summary.py # Summary generator (counts)
logging/
__init__.py
setup.py # structlog configuration
tests/
conftest.py # Global fixtures
test_cli.py # CLI tests
test_cli_edge_cases.py # CLI edge cases
test_integration.py # End-to-end integration tests
test_core/
test_finding.py
test_engine.py
test_file_collector.py
test_config/
test_schema.py
test_loader.py
test_rules.py
test_reports/
test_formatters.py
test_formatters_edge_cases.py
test_fase4_formatters.py # PHASE 4 improvement tests
test_fase4_qa.py # QA: regression, edge cases, consistency
test_analyzers/
test_deps/
test_parsers.py # Dependency parser tests
test_parsers_qa.py # QA: edge cases (markers, BOM, CRLF, Unicode)
test_registry_client.py # Registry client tests
test_registry_client_qa.py # QA: cache, sanitize, response parsing
test_similarity.py # Typosquatting detection tests
test_similarity_qa.py # QA: corpus integrity, false positives, PEP 503
test_analyzer.py # DependencyAnalyzer tests
test_analyzer_qa.py # QA: false positives/negatives, boundaries
test_integration_qa.py # QA: engine+analyzer, CLI+deps, regression
test_auth/
test_analyzer.py # AuthAnalyzer tests
test_endpoint_detector.py # Endpoint detection tests
test_middleware_checker.py # Auth middleware verification tests
test_patterns.py # Regex pattern tests (JWT, CORS, cookies)
test_qa_regression.py # QA: edge cases and regressions
test_secrets/
test_analyzer.py # SecretsAnalyzer tests
test_placeholder_detector.py # Placeholder detection tests
test_entropy.py # Shannon entropy calculation tests
test_env_tracer.py # .env.example tracing tests
test_qa_regression.py # QA: edge cases and regressions
test_tests/
test_analyzer.py # TestQualityAnalyzer tests
test_assert_checker.py # Assertion extraction and counting tests
test_mock_checker.py # Mock mirror detection tests
test_coverage_heuristics.py # Test file identification tests
test_qa_regression.py # QA: edge cases and regressions (81 tests)
fixtures/ # Test files
deps/ # Dependency fixtures
valid_project/ # Project with legitimate deps
hallucinated_deps/ # Hallucinated/invented deps
npm_project/ # npm project with invented deps
clean_project/ # Clean project (no findings)
vulnerable_project/ # Mix of legitimate and suspicious deps
edge_cases/ # Empty, comments-only, markers, URLs, malformed
auth/ # Auth fixtures
insecure_fastapi.py # FastAPI without auth middleware
insecure_flask.py # Flask without auth
insecure_express.js # Express without auth
secure_app.py # App with correct auth (no findings)
edge_cases.py # Edge cases
secrets/ # Secrets fixtures
insecure_secrets.py # Hardcoded secrets (Python)
insecure_secrets.js # Hardcoded secrets (JavaScript)
.env.example # Example .env.example
copies_env_example.py # Code that copies values from .env.example
secure_code.py # Secure code (no findings)
tests/ # Test quality fixtures
vulnerable_tests.py # Python tests with issues (no assertions, trivial, catch-all)
vulnerable_tests.js # JavaScript tests with issues
clean_tests.py # Correct Python tests (no findings)
clean_tests.js # Correct JavaScript tests (no findings)
edge_cases_python.py # Python edge cases (async, single-line, nested)
edge_cases_js.js # JavaScript edge cases (async, describe, nested)
npm_tests.test.js # npm/jest tests with mixed issues
Data models
Severity
String enum with 5 levels, ordered from highest to lowest criticality:
class Severity(str, Enum):
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
INFO = "info"
Using str, Enum allows direct comparison with strings and JSON serialization without conversion.
Category
String enum with 4 analysis categories:
class Category(str, Enum):
DEPENDENCY = "dependency"
AUTH = "auth"
SECRETS = "secrets"
TEST_QUALITY = "test-quality"
Location
Dataclass indicating where the issue was found:
@dataclass
class Location:
file: str # File path
line: int | None = None # Line (1-based)
column: int | None = None # Column (1-based)
end_line: int | None = None # End line (for ranges)
snippet: str | None = None # Code fragment
Finding
Dataclass representing an individual finding:
@dataclass
class Finding:
rule_id: str # "DEP-001", "AUTH-005"
category: Category # Category.DEPENDENCY
severity: Severity # Severity.CRITICAL
message: str # Problem description
location: Location # Where it was found
suggestion: str | None = None # How to fix it
metadata: dict[str, Any] = field(default_factory=dict)
@property
def is_blocking(self) -> bool:
return self.severity in (Severity.CRITICAL, Severity.HIGH)
The is_blocking property determines whether the finding should block a merge (by default, CRITICAL and HIGH are blocking).
Engine flow
The ScanEngine is the central orchestrator. Its run() method executes the complete pipeline:
run(paths)
|
+----------v-----------+
| 1. Collect files |
| (file_collector) |
+----------+-----------+
|
+----------v-----------+
| 2. Run analyzers |
| (for each analyzer) |
+----------+-----------+
|
+----------v-----------+
| 3. Apply overrides |
| (rule_overrides) |
+----------+-----------+
|
+----------v-----------+
| 4. Sort findings |
| (by severity) |
+----------+-----------+
|
v
ScanResult
Step 1: Collect files
file_collector.collect_files() receives the user’s paths and returns a list of files to scan:
- Traverses directories recursively with
os.walk()and in-place pruning of excluded directories (dirnames[:] = [...]). This avoids traversing.venv/,node_modules/, etc., which is critical for performance (a typical.venv/contains thousands of files). - Filters by language extensions (
LANGUAGE_EXTENSIONS). - Excludes configured patterns by path component (not by substring).
- Always includes dependency files (
requirements.txt,package.json, etc.) regardless of the language filter. - Deduplicates while preserving order.
Step 2: Run analyzers
For each registered analyzer:
- Checks if it should run (
_should_run()): respects--categoryand--rulefilters. - Calls
analyzer.analyze(files, config). - Collects the returned findings.
- Catches exceptions per analyzer (a failed analyzer does not stop the others).
Step 3: Apply overrides
_apply_rule_overrides() processes the rules: section of the configuration:
- If a rule has
enabled: false, its findings are removed. - If a rule has
severity: "low", the finding’s severity is modified. - If a rule is in
exclude_rules(from--exclude-rule), it is removed.
Step 4: Sort
Findings are sorted by descending severity (CRITICAL first, INFO last) using SEVERITY_SORT_ORDER.
Analyzer protocol
Each analyzer implements the BaseAnalyzer protocol:
class BaseAnalyzer(Protocol):
@property
def name(self) -> str: ...
@property
def category(self) -> Category: ...
def analyze(self, files: list[str], config: ScanConfig) -> list[Finding]: ...
Contract
name: Unique analyzer name (e.g.,"dependency","auth").category: Category of findings it generates.analyze(): Receives the list of files and configuration, returns findings.
Rules for implementing an analyzer
- Deterministic: The same input always produces the same output.
- No side effects: Does not modify files, does not write to stdout.
- Internal error handling: If a file cannot be read, the analyzer ignores it and continues.
- Logging to stderr: Use
structlogfor debug/info logs. - Respect the configuration: Read thresholds and options from
ScanConfig.
Implementation example
from vigil.analyzers.base import BaseAnalyzer
from vigil.config.schema import ScanConfig
from vigil.core.finding import Category, Finding, Location, Severity
class DependencyAnalyzer:
@property
def name(self) -> str:
return "dependency"
@property
def category(self) -> Category:
return Category.DEPENDENCY
def analyze(self, files: list[str], config: ScanConfig) -> list[Finding]:
findings: list[Finding] = []
# ... analysis logic ...
return findings
No inheritance is required — only satisfying the Protocol (structural typing).
Analyzer registration
In cli.py, analyzers are registered via _register_analyzers(engine) before running the scan:
def _register_analyzers(engine: ScanEngine) -> None:
from vigil.analyzers.deps import DependencyAnalyzer
from vigil.analyzers.auth import AuthAnalyzer
from vigil.analyzers.secrets import SecretsAnalyzer
from vigil.analyzers.tests import TestQualityAnalyzer
engine.register_analyzer(DependencyAnalyzer())
engine.register_analyzer(AuthAnalyzer())
engine.register_analyzer(SecretsAnalyzer())
engine.register_analyzer(TestQualityAnalyzer())
This function is invoked in the scan, deps, and tests commands.
DependencyAnalyzer
The first analyzer implemented. Detects hallucinated dependencies, typosquatting, suspiciously new packages, nonexistent versions, and packages without a source repository.
Internal architecture
DependencyAnalyzer.analyze(files, config)
|
v
[1. _extract_roots(files)] --> Unique root directories
|
v
[2. find_and_parse_all(root)] --> List of DeclaredDependency
| (parsers: req.txt, pyproject.toml, package.json)
v
[3. _deduplicate_deps()] --> Unique deps by name+ecosystem
|
v
[4. load_popular_packages()] --> Corpus for typosquatting
|
+---> [5a. _check_registries()] --> DEP-001, DEP-002, DEP-005, DEP-007
| | (only if online + verify_registry)
| v
| RegistryClient.check(name, ecosystem)
| |
| +---> Cache hit? return cached
| +---> HTTP GET PyPI/npm -> PackageInfo -> cache
|
+---> [5b. find_similar_popular()] --> DEP-003 (always, no network required)
|
v
list[Finding]
Components
| Module | Responsibility |
|---|---|
parsers.py | Parses requirements.txt, pyproject.toml, package.json into DeclaredDependency |
registry_client.py | HTTP client for PyPI/npm with disk cache (~/.cache/vigil/registry/) |
similarity.py | Levenshtein distance, PEP 503 normalization, popular package corpus |
analyzer.py | Orchestrates parsers + registry + similarity, generates findings |
Implemented rules
| Rule | Requires network | Description |
|---|---|---|
| DEP-001 | Yes | Package does not exist in registry |
| DEP-002 | Yes | Package created less than N days ago |
| DEP-003 | No | Name similar to a popular package |
| DEP-005 | Yes | No source repository |
| DEP-007 | Yes | Pinned version does not exist |
Deferred rules (V1)
| Rule | Reason |
|---|---|
| DEP-004 | Requires download statistics API |
| DEP-006 | Requires AST import parser |
Configuration system
Three layers with progressive merging
Defaults (schema.py) < YAML file (.vigil.yaml) < CLI flags
- Defaults: Defined as default values in Pydantic models (
ScanConfig,DepsConfig, etc.). - YAML: Loaded with
pyyamland validated with Pydantic. - CLI: Click flags that override specific fields.
Loader
load_config() in config/loader.py:
- Finds the config file (manually with
--config, or auto-detection by walking up the directory tree). - Parses the YAML.
- Creates a
ScanConfiginstance with the YAML values. - Applies CLI overrides on the instance.
- Returns the final configuration.
Validation
Pydantic v2 automatically validates:
- Data types (
min_age_daysis int, not string). - Valid values (
fail_onis one of critical/high/medium/low). - Nested models (
deps,auth,secrets,tests,output).
Rule catalog
The 26 rules are defined in config/rules.py as RuleDefinition instances:
@dataclass
class RuleDefinition:
id: str # "DEP-001"
name: str # "Hallucinated dependency"
description: str # Long description
category: Category # Category.DEPENDENCY
default_severity: Severity # Severity.CRITICAL
enabled_by_default: bool = True
languages: list[str] | None = None # None = all
owasp_ref: str | None = None # "LLM03"
cwe_ref: str | None = None # "CWE-829"
RuleRegistry
Provides indexed access to the catalog:
registry.get("DEP-001")— get a rule by ID.registry.all()— all rules.registry.by_category(Category.AUTH)— rules in a category.registry.by_severity(Severity.CRITICAL)— rules at a severity level.registry.enabled_rules(overrides)— enabled rules after applying overrides.
Formatters
Protocol
class BaseFormatter(Protocol):
def format(self, result: ScanResult) -> str: ...
Factory
get_formatter(format_name, **kwargs) returns the correct class with lazy import. Only HumanFormatter accepts **kwargs; the others ignore them:
"human" -> HumanFormatter(colors=True, show_suggestions=True, quiet=False)
"json" -> JsonFormatter
"junit" -> JunitFormatter
"sarif" -> SarifFormatter
HumanFormatter
Constructor accepts three options:
| Option | Default | Description |
|---|---|---|
colors | True | Enables ANSI colors (auto-detects TTY) |
show_suggestions | True | Shows fix suggestions |
quiet | False | Suppresses header and summary, shows only findings and errors |
Unicode icons: ✗ (critical/high), ⚠ (medium), ~ (low), i (info). Separators and arrows: ─, →, ✓.
JsonFormatter
Includes analyzers_run, findings_count, errors fields at root level. The summary is generated via build_summary() with breakdown by severity, category, rule, top 10 files, and has_blocking flag. The snippet field in location is only included if present.
JunitFormatter
Includes a <properties> element with scan metadata (vigil.version, vigil.files_scanned, vigil.analyzers). The <failure> text includes Rule, Severity, Category, File, Suggestion, and Snippet. The tests attribute counts findings + errors.
SarifFormatter
Complies with SARIF 2.1.0. Includes semanticVersion, defaultConfiguration per rule, helpUri, ruleIndex in results, snippet in region, invocations with execution status and errors, and OWASP/CWE references in properties. Rule names are converted to PascalCase.
Output flow
ScanResult -> Formatter.format() -> string -> stdout or file
The CLI decides where to send the output:
- Without
--output: stdout. - With
--output: writes to file (and also to stdout for human format).
Logging
structlog
vigil uses structlog for structured logging:
- Verbose mode (
-v): Level DEBUG, with timestamps and key-value pairs. - Normal mode: Level WARNING, minimal output.
- Output always to stderr: Logs never go to stdout. This allows
vigil scan -f json | jqwithout contaminating the JSON with logs.
Verbose mode log example
2024-01-15 10:30:00 [info] files_collected count=42
2024-01-15 10:30:00 [info] analyzer_start name=dependency
2024-01-15 10:30:01 [info] analyzer_done name=dependency findings=2
External dependencies
| Dependency | Version | Purpose |
|---|---|---|
click>=8.1 | CLI framework | Subcommands, options, automatic help |
pydantic>=2.0 | Validation | Configuration models with validation |
httpx>=0.27 | HTTP client | Requests to PyPI/npm (async-capable) |
structlog>=24.1 | Logging | Structured logging to stderr |
pyyaml>=6.0 | YAML parser | Configuration file loading |
Development dependencies
| Dependency | Version | Purpose |
|---|---|---|
pytest>=8.0 | Testing | Test framework |
pytest-cov>=5.0 | Coverage | Test coverage reporting |
ruff>=0.4 | Linting | Python linter and formatter |
Design decisions
Why Protocol and not ABC
typing.Protocol (structural typing) is used instead of abc.ABC (nominal typing) for:
- Flexibility: Analyzers do not need to inherit from a base class.
- Testing: It is trivial to create fakes/mocks that satisfy the protocol.
- Decoupling: Modules do not depend on the base class.
Why dataclasses and not Pydantic for Finding
Finding,Location, andRuleDefinitionare internal data models that do not need validation.- Pydantic is reserved for user configuration where validation is critical.
- Dataclasses are lighter and faster for data that is created internally.
Why structlog
- Structured logging (key-value) facilitates parsing and filtering.
- Clear separation of output (stdout) vs logs (stderr).
- Centralized configuration with processors.
Why not async
vigil V0 is synchronous. The reasons:
- Most operations are filesystem I/O, which is fast.
- HTTP requests to the registry can be made with synchronous
httpx. - The simplicity of synchronous code facilitates debugging and testing.
- Migration to async in future versions is possible if performance requires it.