ML Experiment Accelerator
Experimental notebook → 5-step pipeline → production code with tests and MLflow config.
The problem
85% of ML models never reach production. A key factor: the gap between the data scientist’s notebook and the production-ready code that the MLOps pipeline requires. The data scientist knows which model they want but does not always write productionizable code: no tests, no type hints, no error handling, no logging, hardcoded hyperparameters.
Where architect fits in
Architect as a translator from notebooks to production code. The pipeline takes the experimental notebook, generates clean code following the team’s MLOps conventions (defined in .architect.md), generates tests, and validates that the training pipeline works.
Diagram
flowchart TD
A["👩🔬 Data Scientist\nExperimental notebook\n(experiment.ipynb)"] --> B["architect pipeline\nml-productionize.yaml"]
subgraph architect_pipeline["Architect Pipeline"]
direction TB
B --> C["Step 1: Extract\nParse notebook,\nextract core logic"]
C --> D["Step 2: Structure\nGenerate modules:\ndata.py, model.py,\ntrain.py, evaluate.py"]
D --> E["Step 3: Harden\nAdd type hints,\nerror handling,\nlogging, configs"]
E --> F["Step 4: Test\nGenerate tests +\nrun them"]
F --> G["Step 5: Pipeline\nGenerate MLflow/\nKubeflow YAML pipeline"]
end
G --> H["PR with:\n- Productionized code\n- Tests\n- Pipeline config\n- Report"]
H --> I["ML Engineer Review"]
I --> J["MLOps Pipeline\n(MLflow / Kubeflow\n/ Vertex AI)"]
J --> K["Train → Evaluate\n→ Register → Deploy"]
style B fill:#2563eb,color:#fff,stroke:#1d4ed8
style C fill:#7c3aed,color:#fff,stroke:#6d28d9
style D fill:#7c3aed,color:#fff,stroke:#6d28d9
style E fill:#7c3aed,color:#fff,stroke:#6d28d9
style F fill:#7c3aed,color:#fff,stroke:#6d28d9
style G fill:#7c3aed,color:#fff,stroke:#6d28d9
Implementation
Pipeline YAML
# ml-productionize.yaml
name: ml-productionize
steps:
- name: extract
agent: build
task: >
Parse experiment.ipynb (nbformat JSON format).
Identify: imports, data loading, preprocessing,
model definition, training loop, evaluation metrics.
Ignore: exploratory cells, visualizations, markdown.
Write a summary in EXTRACTION_PLAN.md.
- name: structure
agent: build
task: >
Following EXTRACTION_PLAN.md, generate Python modules:
- src/data/loader.py (data loading + preprocessing)
- src/models/model.py (model definition)
- src/training/train.py (training loop with MLflow tracking)
- src/evaluation/evaluate.py (metrics + evaluation)
- configs/default.yaml (externalized hyperparameters)
- name: harden
agent: build
task: >
Add to all generated modules:
- Type hints on all functions
- Google-style docstrings
- Logging with structlog (replace prints)
- Error handling (try/except with useful messages)
- Seed reproducibility (torch/numpy/random)
Externalize ALL hyperparameters to configs/default.yaml.
- name: test
agent: build
task: >
Generate tests/test_data.py, tests/test_model.py, tests/test_training.py.
Tests should verify: data shapes, model forward pass,
that the training loop reduces loss in 5 steps.
Run pytest to verify all pass.
- name: pipeline-config
agent: build
task: >
Generate configs/mlflow_pipeline.yaml with the configuration
to run training as an MLflow job:
entry_points, parameters, metrics, artifacts.
Also generate a Makefile with targets: train, evaluate, test.
.architect.md for ML
# ML Code Conventions
## Structure
- src/data/ → data loading, preprocessing, feature engineering
- src/models/ → model definitions
- src/training/ → training loops, callbacks
- src/evaluation/ → metrics, evaluation logic
- configs/ → hydra/omegaconf configs
- tests/ → pytest tests
## Required
- Type hints on all public functions
- Google-style docstrings with Args/Returns/Raises
- Logging with structlog (no print)
- Externalized configs (no hardcoded hyperparams)
- Seed reproducibility (torch.manual_seed, np.random.seed)
- MLflow tracking in training loop (log_params, log_metrics, log_model)
## Prohibited
- No wildcard imports (from x import *)
- No absolute paths
- No credentials in code
- No dependencies without pinned versions in requirements.txt
- No mutable globals
Configuration
# .architect.yaml
llm:
model: openai/gpt-4.1
api_key_env: OPENAI_API_KEY
guardrails:
protected_files:
- "experiment.ipynb" # Do not modify the original notebook
- "data/**" # Do not touch data
- "*.csv"
- "*.parquet"
code_rules:
- pattern: 'from .* import \*'
message: "No wildcard imports"
severity: block
- pattern: 'print\('
message: "Use structlog instead of print"
severity: warn
Architect features used
| Feature | Role in this architecture |
|---|---|
| Pipeline | 5 sequential steps: extract → structure → harden → test → config |
| Sub-agents | Different agents for generation vs testing |
| .architect.md | Team ML conventions (structure, logging, configs) |
| Guardrails | Protects original notebook and data |
| code_rules | Blocks wildcard imports, warns on prints |
| Reports | Documentation of what was generated and test results |
Result
From a 200-cell notebook, architect generates:
- 4-6 clean Python modules with type hints and docstrings
- Unit tests that verify shapes, forward pass, and convergence
- YAML config with externalized hyperparameters
- Pipeline config for MLflow/Kubeflow
- Makefile with standard targets
The ML Engineer reviews the PR and connects it to the existing MLOps pipeline. The gap from “weeks of productionization” is reduced to “hours of review”.