In this project, I apply MLOps and DataOps practices to the deployment of a text classification model aiming at classifying ML projects in different categories.
The different steps of this project include:
- Development of a MVP model on Jupyter Notebook
- Experiment Tracking with MLFlow
- Hyperparameter optimization with Optuna
- Project packaging, dockerizing and deployment on FastAPI
- Data visualization on Streamlit
- Test-driven development with Pytest - Great-Expectations (GE)
- Code cleaning and documentation (Link) with mkdocs, flake8, isort and black.
- Data and model versioning using DVC and Github
- Implementation of CI/CD pratices with Github actions / pre-commit
- Creation of a modern data stack including Airbyte, BigQuery and dbt
- Orchestration of the DataOps pipeline in a separate repository with Airflow
- Orchestration of the MLOps pipeline with Airflow
- Creation of a feature store with Feast
- Jupyter Notebook (POC - Tracer Bullet)
- Packaged Project Folder
- FastAPI Folder
- Streamlit Folder
- Dockerfile
- Testing folder (including Great Expectations)
- MLOps Airflow pipeline
- DataOps Airflow pipeline (separate repository)
- Root repository Folder
- Feast
- Apache Airflow
- BigQuery
- Airbyte
- Isort
- Flake8
- Black
- Mkdocs
- Pre-commit
- Great-expectations
- Pytest
- DVC
- Streamlit
- FastAPI
- Rich
- MLFlow
- Snorkel
- Imbalanced-Learn
- Scikit-Learn
- nltk
- Pandas / Numpy
python3 -m venv venv
source venv/bin/activate
python3 -m pip --upgrade pip
python3 -m pip install pip setuptools wheel
python3 -m pip install -e .uvicorn app.api:app --host 0.0.0.0 --port 8000 --reload --reload-dir classifyops --reload-dir app # dev
gunicorn -c app/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app # proddocker build -t classifyops:latest -f Dockerfile .docker run -p 8000:8000 --name classifyops classifyops:latestOn one terminal (in the root project dir), launch airflow server:
source venv/bin/activate
export AIRFLOW_HOME=${PWD}/airflow
export GOOGLE_APPLICATION_CREDENTIALS=/Link/to/BigQuery/JSON/file.json
airflow webserver --port 8080On a second terminal, launch airflow scheduler:
source venv/bin/activate
export AIRFLOW_HOME=${PWD}/airflow
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
export GOOGLE_APPLICATION_CREDENTIALS=/Link/to/BigQuery/JSON/file.json
airflow schedulerThe Airflow orchestration workflow (DAG) can be launched from your browser on http://localhost:8080 :
Octave Antoni
Copyright 2023 Octave Antoni
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


