Analyzers
vigil uses a modular analyzer system. Each analyzer focuses on a detection category and produces findings independently. This document describes the implemented analyzers.
For the general analyzer architecture (protocol, registration, flow), see Architecture.
DependencyAnalyzer (CAT-01)
Module: src/vigil/analyzers/deps/
Category: dependency
Active rules: DEP-001, DEP-002, DEP-003, DEP-005, DEP-007
Detects hallucinated dependencies (slopsquatting), typosquatting, suspicious packages, nonexistent versions, and packages without a source repository.
Supported dependency files
| File | Ecosystem | Parser |
|---|---|---|
requirements.txt | PyPI | parse_requirements_txt() |
requirements-dev.txt, requirements-*.txt | PyPI | parse_requirements_txt() |
pyproject.toml ([project.dependencies], [project.optional-dependencies]) | PyPI | parse_pyproject_toml() |
package.json (dependencies, devDependencies) | npm | parse_package_json() |
Files are discovered automatically with find_and_parse_all(), which traverses the directory tree while skipping .venv/, node_modules/, .git/, etc.
Implemented rules
DEP-001 — Hallucinated dependency (CRITICAL)
Verifies that each declared package exists in the public registry (PyPI or npm). If it does not exist, it is very likely a name hallucinated by the AI agent.
# requirements.txt
flask==3.0.0
python-jwt-utils==1.0.0 # NO existe en PyPI -> DEP-001 CRITICAL
Requires network: Yes. Skipped in --offline mode.
DEP-002 — Suspiciously new dependency (HIGH)
Checks the package creation date. If it was created less than deps.min_age_days days ago (default: 30), it may be a malicious package registered as part of a slopsquatting attack.
Requires network: Yes. Skipped in --offline mode.
DEP-003 — Typosquatting candidate (HIGH)
Compares each dependency name against a corpus of popular packages using normalized Levenshtein distance. If the similarity is >= deps.similarity_threshold (default: 0.85), it is a typosquatting candidate.
# requirements.txt
requets==2.31.0 # Similaridad 0.875 con "requests" -> DEP-003 HIGH
Requires network: No. Works in --offline mode.
Normalization: For PyPI, hyphens (-), underscores (_), and dots (.) are treated as equivalent (PEP 503). my-package, my_package, and my.package are normalized to the same name before comparison.
Corpus: A built-in corpus of ~100 PyPI packages and ~70 npm packages is used as a fallback. When the files data/popular_pypi.json and data/popular_npm.json are generated (PHASE 6), those will be used instead.
DEP-005 — No source repository (MEDIUM)
Verifies that the package has a source code repository linked in its metadata. Packages without a repository are harder to audit.
Requires network: Yes. Skipped in --offline mode.
DEP-007 — Nonexistent version (CRITICAL)
Verifies that the exact pinned version exists in the registry. Only applies to exact versions (==1.2.3 in PyPI, 1.2.3 without prefix in npm).
# requirements.txt
flask==99.0.0 # Version no existe -> DEP-007 CRITICAL
Requires network: Yes. Skipped in --offline mode.
Deferred rules
| Rule | Reason | Estimate |
|---|---|---|
| DEP-004 (unpopular) | Requires download statistics API, not available in basic PyPI/npm metadata | V1 or PHASE 6 |
| DEP-006 (missing import) | Requires AST import parser, out of scope for V0 (regex-based) | V1 |
Analysis flow
- Discovery:
find_and_parse_all()traverses directories withos.walk()+ pruning, looking for dependency files. - Parsing: Each file is parsed into a list of
DeclaredDependencywith name, version, source file, line, and ecosystem. - Deduplication: Duplicates by name+ecosystem are removed (e.g., same package in
requirements.txtandpyproject.toml). - Registry verification (if online): For each unique package, PyPI/npm is queried via
RegistryClient. DEP-001, DEP-002, DEP-005, DEP-007 are applied. - Similarity verification (always): For each unique package, popular packages with similar names are searched. DEP-003 is applied.
Registry Client
The RegistryClient handles HTTP queries to PyPI and npm:
- Disk cache:
~/.cache/vigil/registry/with individual JSON files per package. - Configurable TTL: Default 24 hours (
deps.cache_ttl_hours). - Lazy init: The httpx client is created only when the first request is made.
- Context manager: Supports
with RegistryClient() as client:for automatic cleanup. - Resilience: Network errors assume the package exists (avoids false positives on unstable connections).
# Limpiar cache
rm -rf ~/.cache/vigil/registry/
# Forzar requests frescas
# (configurar cache_ttl_hours: 0 en .vigil.yaml)
Relevant configuration
deps:
# Verificar contra registries (false = solo checks estaticos)
verify_registry: true
# Dias minimos de antiguedad (DEP-002)
min_age_days: 30
# Umbral de similaridad para typosquatting (DEP-003)
# 0.85 = captura typos de 1 caracter en nombres de 8+ caracteres
similarity_threshold: 0.85
# TTL del cache de registry
cache_ttl_hours: 24
# Modo offline (no HTTP)
offline_mode: false
Offline mode
With --offline or deps.offline_mode: true:
| Rule | Behavior |
|---|---|
| DEP-001 | Skipped (requires registry verification) |
| DEP-002 | Skipped (requires creation date from registry) |
| DEP-003 | Active (local comparison against corpus) |
| DEP-005 | Skipped (requires registry metadata) |
| DEP-007 | Skipped (requires version list from registry) |
AuthAnalyzer (CAT-02)
Module: src/vigil/analyzers/auth/
Category: auth
Active rules: AUTH-001, AUTH-002, AUTH-003, AUTH-004, AUTH-005, AUTH-006, AUTH-007
Detects insecure authentication and authorization patterns in Python (FastAPI/Flask) and JavaScript (Express) using regex pattern matching.
Internal architecture
The analyzer consists of 4 modules:
| Module | Responsibility |
|---|---|
analyzer.py | Orchestrates detection, iterates files and lines |
endpoint_detector.py | Detects HTTP endpoints (decorators in Python, app.get/post/... in JS) |
middleware_checker.py | Verifies whether an endpoint has auth middleware (Depends(...), passport, etc.) |
patterns.py | Regex patterns for JWT lifetime, hardcoded secrets, CORS, cookies, passwords |
AuthAnalyzer.analyze(files, config)
|
v
[1. Filtrar archivos relevantes (.py, .js, .ts, .jsx, .tsx)]
|
v
[2. detect_endpoints(content)] --> Lista de EndpointInfo
| (ruta, metodo, linea, framework)
v
[3. check_endpoint_auth(ep)] --> AUTH-001 / AUTH-002 findings
|
v
[4. _check_lines() por linea]
+---> AUTH-003: JWT lifetime excesivo
+---> AUTH-004: Secret hardcodeado con baja entropy
+---> AUTH-005: CORS allow all origins
+---> AUTH-006: Cookie sin flags de seguridad
+---> AUTH-007: Password comparison no timing-safe
|
v
list[Finding]
Implemented rules
| Rule | Severity | Requires network | Description |
|---|---|---|---|
| AUTH-001 | HIGH | No | Sensitive endpoint without auth middleware |
| AUTH-002 | HIGH | No | Mutating endpoint (DELETE/PUT/PATCH) without auth |
| AUTH-003 | MEDIUM | No | JWT with excessive lifetime (>24h by default) |
| AUTH-004 | CRITICAL | No | Hardcoded JWT secret with low entropy |
| AUTH-005 | HIGH | No | CORS configured with * (allow all) |
| AUTH-006 | MEDIUM | No | Cookie without security flags (httpOnly, secure, sameSite) |
| AUTH-007 | MEDIUM | No | Password comparison with == (vulnerable to timing attacks) |
All rules are offline — they do not require network. They only analyze source code.
Endpoint detection
The endpoint_detector detects HTTP endpoints in three frameworks:
FastAPI/Flask (Python):
@app.get("/users/{user_id}") # Detectado
@router.delete("/users/{user_id}") # Detectado
@app.route("/admin", methods=["POST"]) # Detectado
Express (JavaScript):
app.get("/users/:id", handler) // Detectado
router.delete("/users/:id", handler) // Detectado
Auth middleware detection looks for:
- Python:
Depends(...),login_required,@requires_auth,Permission,current_user - JavaScript:
passport,authenticate,isAuthenticated,requireAuth,authMiddleware
Sensitive endpoint heuristics (AUTH-001)
An endpoint is considered sensitive if its path contains tokens such as:
user, admin, account, profile, payment, order, billing, settings, password, token, auth, session, dashboard
Relevant configuration
auth:
# Maximo horas de lifetime para JWT (AUTH-003)
max_token_lifetime_hours: 24
# Requerir auth en endpoints mutantes (AUTH-002)
require_auth_on_mutating: true
# Permitir CORS abierto en archivos de dev/test (AUTH-005)
cors_allow_localhost: true
Integration with SecretsAnalyzer
AUTH-004 (hardcoded JWT secret) uses shannon_entropy() from the secrets/entropy.py module to calculate the value’s entropy. It only reports secrets with entropy < 4.0 bits/char (typical placeholders like "supersecret" or "secret123"). High-entropy secrets are left for SEC-002.
SecretsAnalyzer (CAT-03)
Module: src/vigil/analyzers/secrets/
Category: secrets
Active rules: SEC-001, SEC-002, SEC-003, SEC-004, SEC-006
Detects poorly managed secrets and credentials in code, with emphasis on patterns typical of AI-generated code: copied placeholders, low-entropy secrets, and values from .env.example embedded in source.
Internal architecture
| Module | Responsibility |
|---|---|
analyzer.py | Orchestrates detection, applies per-line and per-file checks |
placeholder_detector.py | Compiles placeholder regex, detects secret assignments |
entropy.py | Calculates Shannon entropy to distinguish real secrets from placeholders |
env_tracer.py | Parses .env.example, searches for copied values in source code |
SecretsAnalyzer.analyze(files, config)
|
v
[1. Compilar placeholder_patterns (30 regex)]
|
v
[2. Cargar .env.example entries (si check_env_example=true)]
|
v
[3. Por cada archivo relevante (.py, .js, .ts, ...)]
+---> SEC-006: find_env_values_in_code() contra entries de .env.example
+---> SEC-003: Connection strings con credenciales (postgresql://, mongodb://, etc.)
+---> SEC-004: Env vars sensibles con default hardcodeado
+---> SEC-001: Secret assignment con valor placeholder
+---> SEC-002: Secret assignment con baja entropy
|
v
list[Finding]
Implemented rules
| Rule | Severity | Description |
|---|---|---|
| SEC-001 | CRITICAL | Placeholder value in code ("your-api-key-here", "changeme", etc.) |
| SEC-002 | CRITICAL | Hardcoded secret with low entropy (< 3.0 bits/char by default) |
| SEC-003 | CRITICAL | Connection string with embedded credentials (postgresql://, mongodb://, etc.) |
| SEC-004 | HIGH | Sensitive environment variable with hardcoded default value in code |
| SEC-006 | CRITICAL | Value copied verbatim from .env.example into source code |
Deferred rule
| Rule | Reason | Estimate |
|---|---|---|
| SEC-005 (file not in gitignore) | Requires .gitignore analysis with glob patterns | V1 or later PHASE |
Placeholder detection (SEC-001)
The analyzer ships with 30 regex patterns for known placeholders, configurable via secrets.placeholder_patterns:
- Generic values:
changeme,TODO,FIXME,placeholder,xxx+ - Template patterns:
your-*-here,replace-me,insert-*-here,put-*-here,add-*-here - API key prefixes:
sk-your*,pk_test_*,sk_test_*,sk_live_test* - Typical AI values:
secret123,password123,supersecret,mysecret,my-secret-key - Example values:
example.com,test-key,dummy-key,fake-key,sample-key,default-secret
Shannon entropy (SEC-002)
Low-entropy secret detection uses Shannon entropy calculation:
"password123"-> ~2.8 bits/char (placeholder)"xK8$mP2!qR"-> ~3.3 bits/char (borderline)"a1b2c3d4e5f6g7h8"-> ~4.0 bits/char (probably real)
The default threshold is 3.0 bits/char. It is configured with secrets.min_entropy.
Connection string detection (SEC-003)
Supported protocols: postgresql, postgres, mysql, mariadb, mongodb, mongodb+srv, redis, amqp, rabbitmq, sqlserver, mssql.
# Detectado
DATABASE_URL = "postgresql://admin:password123@db.example.com:5432/mydb"
# NO detectado (usa variable de entorno en el password)
DATABASE_URL = f"postgresql://admin:${DB_PASS}@db.example.com:5432/mydb"
In output snippets, the password is automatically redacted: postgresql://admin:***@db.example.com:5432/mydb.
Env defaults detection (SEC-004)
Detects sensitive environment variables with hardcoded default values:
# Python — detectado
SECRET_KEY = os.getenv("SECRET_KEY", "fallback-secret")
API_KEY = os.environ.get("API_KEY", "test-key-123")
# JavaScript — detectado
const secret = process.env.SECRET_KEY || "mysecret"
const key = process.env["API_KEY"] || "default-key"
Only reports if the variable name contains sensitive tokens: SECRET, KEY, TOKEN, PASSWORD, API_KEY, AUTH, JWT, DATABASE_URL, DB_PASS, PRIVATE_KEY, ENCRYPTION, SIGNING, STRIPE, AWS.
.env.example tracing (SEC-006)
If secrets.check_env_example: true (default), the analyzer:
- Searches for
.env.example,.env.sample,.env.templatefiles in root directories. - Parses each file extracting
KEY=valuepairs. - Searches for those exact values in source code.
- If a value from
.env.exampleappears in a.pyor.jsfile, it generates SEC-006 CRITICAL.
Relevant configuration
secrets:
# Entropia minima de Shannon para SEC-002
min_entropy: 3.0
# Comparar con .env.example para SEC-006
check_env_example: true
# Patrones regex de placeholders para SEC-001
# (lista de 30 patrones por defecto — ver schema.py)
placeholder_patterns:
- "changeme"
- "your-.*-here"
- "replace-?me"
# ... (30 patrones por defecto)
TestQualityAnalyzer (CAT-06)
Module: src/vigil/analyzers/tests/
Category: test-quality
Active rules: TEST-001, TEST-002, TEST-003, TEST-004, TEST-005, TEST-006
Detects test theater — tests that pass but do not verify anything real. Supports pytest/unittest (Python) and jest/mocha (JavaScript/TypeScript).
Internal architecture
| Module | Responsibility |
|---|---|
analyzer.py | Orchestrates detection, iterates test files and functions |
assert_checker.py | Extracts test functions, counts assertions, detects trivial ones, catch-all, skips, API tests |
mock_checker.py | Detects mock return values and cross-references them with assertions to find mirrors |
coverage_heuristics.py | Identifies test files and detects framework (pytest, jest, mocha) |
TestQualityAnalyzer.analyze(files, config)
|
v
[1. Filtrar archivos de test (.py con test_, .test.js, .spec.ts, etc.)]
|
v
[2. TEST-004: find_skips_without_reason() — analisis global]
|
v
[3. Extraer funciones de test]
+---> Python: extract_python_test_functions() (indentacion)
+---> JS: extract_js_test_functions() (conteo de llaves)
|
v
[4. Por cada funcion (saltando skipped):]
+---> TEST-001: count_assertions() < min_assertions_per_test
+---> TEST-002: find_trivial_assertions() (solo si TODAS triviales)
+---> TEST-003: find_catch_all_exceptions()
+---> TEST-005: is_api_test() && !has_status_code_assertion()
+---> TEST-006: find_mock_mirrors()
|
v
list[Finding]
Implemented rules
| Rule | Severity | Description |
|---|---|---|
| TEST-001 | HIGH | Test without assertions (only verifies the code does not crash) |
| TEST-002 | MEDIUM | Trivial assertions (assert True, assert x is not None, toBeTruthy()) |
| TEST-003 | MEDIUM | Catch-all exceptions (except Exception: pass, catch(e)) |
| TEST-004 | LOW | Test skipped without reason (@pytest.mark.skip, test.skip, xit) |
| TEST-005 | MEDIUM | API test without status code verification |
| TEST-006 | MEDIUM | Mock mirror (mock returns a literal that matches the assertion) |
All rules are offline — they do not require network.
Test function detection
Python:
def test_*():andasync def test_*():(functions and class methods)- Single-line:
def test_x(): assert True - Body end determined by indentation
JavaScript:
test('name', () => { ... })andit('name', () => { ... })- Body end determined by brace
{}counting
Triviality heuristic (TEST-002)
Only reported when all assertions in a test are trivial. If there is at least one real assertion mixed with trivial ones, no finding is generated.
Trivial Python patterns: assert True, assert x (bare), assert x is not None, assert x is None, assertTrue(True), assertIsNotNone(x), assertIsNone(x)
Trivial JavaScript patterns: toBeTruthy(), toBeDefined(), not.toBeNull(), not.toBeUndefined(), toBe(true)
Mock mirrors (TEST-006)
Only detects literal values (numbers, strings, booleans, None/null). Complex values (functions, lists, dicts) are ignored to avoid false positives.
# DETECTADO — mock mirror
mock_calc.return_value = 42
result = get_price()
assert result == 42 # Solo prueba que el mock funciona
# NO DETECTADO — valores distintos
mock_data.return_value = 10
result = transform()
assert result == 20 # Prueba logica real
Relevant configuration
tests:
# Minimo de assertions por test (TEST-001)
min_assertions_per_test: 1
# Detectar assertions triviales (TEST-002)
detect_trivial_asserts: true
# Detectar mock mirrors (TEST-006)
detect_mock_mirrors: true