SE

Jenkins + MLflow + DVC + SQLite

This project shows how to automate an MLflow experiment with Jenkins, persist the Iris data set in SQLite, and version both data and trained models with DVC so everything can be pushed to GitHub.

What You Get

prepare_data.py: materialises the Iris data to data/iris.parquet and loads the same rows into data/iris.db (SQLite).
train.py: trains Logistic Regression, logs runs/artifacts to MLflow, and serialises the model to artifacts/model.pkl.
data_utils.py: helper functions for reading/writing the Iris data.
dvc.yaml / dvc.lock / params.yaml: define the DVC pipeline (prepare_data -> train_model) and the hyperparameters Jenkins modifies.
Jenkinsfile + Jenkinsfile.windows: CI pipelines. By default they install dependencies, tweak params.yaml, then run dvc repro. Set the USE_MLFLOW_PROJECT parameter to switch to mlflow run ..
MLproject + python_env.yaml: optional MLflow Projects entry point for ad-hoc runs.

Data to Database Flow

prepare_data.py exports the Iris frame to data/iris.parquet.
The same script writes the rows into the iris_samples table inside data/iris.db. Example query: sqlite3 data/iris.db "SELECT * FROM iris_samples LIMIT 5;".
train.py reuses the serialized data; if it is missing, the script regenerates the file and refreshes the DB.

Jenkins Usage

Create a pipeline job pointing to this repo.
Parameters available in both Jenkinsfiles:
- MLFLOW_TRACKING_URI: MLflow server URL (falls back to agent env vars if empty).
- MLFLOW_EXPERIMENT_NAME: overrides the experiment in params.yaml.
- MAX_ITER: forwarded into params.yaml before dvc repro.
- USE_MLFLOW_PROJECT: when true, skip DVC and call mlflow run ..
- RUN_SECURITY_SCANS: when true, installs requirements-security.txt and runs pip-audit (dependency CVEs) and bandit (Python static analysis) before training.
- RUN_MS_SECURITY (Windows agents): if msdo (Microsoft Security DevOps) is installed, run CredScan/DevSkim/Bandit and emit msdo.sarif.
- RUN_GARAK + GARAK_COMMAND: run Garak LLM red-team tests; put the full Garak CLI args (model, n-probes, report path, etc.) into GARAK_COMMAND.
- RUN_FAIRLEARN: run a Fairlearn bias snapshot on the trained model and dataset.
- RUN_GISKARD: run a Giskard scan of the trained model and dataset.
- RUN_CREDO_AI: capture Credo AI metadata (version + basic dataset info).
- RUN_CYCLONEDX: generate a CycloneDX SBOM from requirements.txt.
Linux agents use Jenkinsfile (shell), Windows agents use Jenkinsfile.windows (PowerShell).
Jenkins archives mlruns_local/**, security outputs, Garak reports, fairness/scanner outputs, and the SBOM so they can be downloaded even without MLflow UI access.

DVC Workflow

Data (data/iris.parquet, data/iris.db) and the model (artifacts/model.pkl) are tracked as DVC outputs with cache: false, so the actual files stay in Git while DVC captures lineage in dvc.lock.

python -m venv .venv
. .venv/bin/activate           # Windows: .\.venv\Scripts\activate
pip install -r requirements.txt
dvc repro
dvc exp show
# optional: configure remote storage
# dvc remote add -d s3 s3://my-bucket/path

Publishing to GitHub (`https://github.com/Ayoub-Samir/SE`)

git add .
git commit -m "Add Jenkins + MLflow + DVC pipeline"
git branch -M main
git remote add origin https://github.com/Ayoub-Samir/SE.git
git push -u origin main

Local Smoke Test

python -m venv .venv
. .venv/bin/activate           # Windows: .\.venv\Scripts\activate
pip install -r requirements.txt
export MLFLOW_TRACKING_URI=http://localhost:5000   # optional
dvc repro                      # or: python prepare_data.py && python train.py
mlflow ui --backend-store-uri ./mlruns --port 5000

Open http://127.0.0.1:5000 to inspect the latest runs.

Extending Further

Point params.yaml to a different data source or table and re-run dvc repro.
Configure dvc remote add to S3/Azure/GDrive when data/models grow larger.
Add a Jenkins post step that prints a link to your hosted MLflow UI using the run ID from the logs.
For stronger MLSecOps/OWASP coverage, add secrets scanning (e.g., gitleaks/detect-secrets) and artifact signing; hashes are already logged to MLflow via security_manifest.json for integrity checks.
To stay within the Microsoft ecosystem: install msdo on Windows agents and enable RUN_MS_SECURITY to get CredScan/DevSkim/Bandit SARIF output.
LLM red-teaming: enable RUN_GARAK and pass something like --model openai:gpt-4o-mini --n-probes 10 --report garak_report.json into GARAK_COMMAND (supply your own model/API credentials).
Fairness & governance: toggle RUN_FAIRLEARN, RUN_GISKARD, RUN_CREDO_AI, and/or RUN_CYCLONEDX to emit bias, QA/governance metadata, and SBOM artifacts under artifacts/.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SE

Jenkins + MLflow + DVC + SQLite

What You Get

Data to Database Flow

Jenkins Usage

DVC Workflow

Publishing to GitHub (`https://github.com/Ayoub-Samir/SE`)

Local Smoke Test

Extending Further

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.dvc		.dvc
artifacts		artifacts
data		data
.dvcignore		.dvcignore
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
Jenkinsfile.windows		Jenkinsfile.windows
MLproject		MLproject
README.md		README.md
audit_tools.py		audit_tools.py
data_utils.py		data_utils.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
prepare_data.py		prepare_data.py
python_env.yaml		python_env.yaml
requirements-security.txt		requirements-security.txt
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

SE

Jenkins + MLflow + DVC + SQLite

What You Get

Data to Database Flow

Jenkins Usage

DVC Workflow

Publishing to GitHub (https://github.com/Ayoub-Samir/SE)

Local Smoke Test

Extending Further

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Publishing to GitHub (`https://github.com/Ayoub-Samir/SE`)

Packages