Skip to content

octave-ati/MLOps-Text-Classification

Repository files navigation

ClassifyOps, a MLOps Text Classification Project

In this project, I apply MLOps and DataOps practices to the deployment of a text classification model aiming at classifying ML projects in different categories.

Project Flowchart

Project Flowchart

The different steps of this project include:

  • Development of a MVP model on Jupyter Notebook
  • Experiment Tracking with MLFlow
  • Hyperparameter optimization with Optuna
  • Project packaging, dockerizing and deployment on FastAPI
  • Data visualization on Streamlit
  • Test-driven development with Pytest - Great-Expectations (GE)
  • Code cleaning and documentation (Link) with mkdocs, flake8, isort and black.
  • Data and model versioning using DVC and Github
  • Implementation of CI/CD pratices with Github actions / pre-commit
  • Creation of a modern data stack including Airbyte, BigQuery and dbt
  • Orchestration of the DataOps pipeline in a separate repository with Airflow
  • Orchestration of the MLOps pipeline with Airflow
  • Creation of a feature store with Feast

Experiment Tracking

Experiment Tracking

Pruning

Pruning

Hyperparameter importance

Hyperparameter importance

Useful Links

Libraries / Packages Used

Quick how-to

Virtual environment creation

python3 -m venv venv
source venv/bin/activate
python3 -m pip --upgrade pip
python3 -m pip install pip setuptools wheel
python3 -m pip install -e .

Launching App

uvicorn app.api:app --host 0.0.0.0 --port 8000 --reload --reload-dir classifyops --reload-dir app  # dev
gunicorn -c app/gunicorn.py -k uvicorn.workers.UvicornWorker app.api:app  # prod

Building docker container

docker build -t classifyops:latest -f Dockerfile .

Running container

docker run -p 8000:8000 --name classifyops classifyops:latest

Airflow orchestration (MLOps)

On one terminal (in the root project dir), launch airflow server:

source venv/bin/activate
export AIRFLOW_HOME=${PWD}/airflow
export GOOGLE_APPLICATION_CREDENTIALS=/Link/to/BigQuery/JSON/file.json
airflow webserver --port 8080

On a second terminal, launch airflow scheduler:

source venv/bin/activate
export AIRFLOW_HOME=${PWD}/airflow
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
export GOOGLE_APPLICATION_CREDENTIALS=/Link/to/BigQuery/JSON/file.json
airflow scheduler

The Airflow orchestration workflow (DAG) can be launched from your browser on http://localhost:8080 :

airflow

Developed By

Octave Antoni

Connect with me on Linkedin

License

Copyright 2023 Octave Antoni

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

About

Implementation of a Text Classification Project with MLOps practices

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors