Text Classification Using LLM and ML/DL Techniques

Project Overview

This project presents a comparative study of text classification using Large Language Models (LLMs) and traditional Machine Learning (ML) techniques on a domain-specific dataset.

Sept 2024 – Oct 2024
Text Classification Using LLM and ML/DL Techniques

Conducted a comparative study of text classification using LLMs (Llama-3 8B) and classical ML models (TF-IDF + Logistic Regression) on a domain-specific dataset.
Performed data preprocessing and class imbalance handling, and applied prompt engineering techniques (zero-shot and EHC) with LLM fine-tuning for domain adaptation.
Utilized Ollama, Hugging Face, and LangChain to structure LLM inference, fine-tuning, and evaluation workflows, enabling systematic benchmarking against traditional machine learning approaches.

Two independent approaches are implemented:

LLM-Based Approach

Uses Llama-3 8B via Ollama
Performed domain-specific fine-tuning to adapt the model for the classification task
Applied prompt engineering strategies, including:
- Zero-shot prompting
- EHC prompting
Evaluated the effectiveness of LLM-based classification compared to traditional ML methods

Classical Machine Learning Approach
- Uses TF-IDF vectorization
- Logistic Regression classifier
- Traditional NLP pipeline for benchmarking against LLMs

The project is fully containerized using Docker, ensuring reproducibility and consistent execution environments.

Project Highlights

Comparative evaluation between LLM-based and traditional ML classification
Prompt engineering experimentation with LLMs
Domain-specific dataset preprocessing
Class imbalance handling
Modular pipeline for data preprocessing, model execution, and result generation
Fully Dockerized workflow

Technologies Used

Python
Docker
Ollama
Hugging Face
LangChain
Scikit-learn
TF-IDF
Logistic Regression

Project Structure

.
├── data
│   ├── cleaned_test_data.tsv
│   ├── cleaned_train_data.tsv
│   ├── cleaned_test_data_ml.tsv
│   ├── cleaned_train_data_ml.tsv
│   └── TWNERTC_TC_Fine_Grained...
│
├── models
│   ├── llm_model.py
│   └── ml_model.py
│
├── preprocessing
│   ├── dataset.py
│   └── dataset_ml.py
│
├── results
│   ├── llm
│   └── ml
│
├── Dockerfile
├── requirements.txt
└── README.md

Folder Description

`data/`

Contains datasets used for training and evaluation.

cleaned_train_data.tsv – Training dataset for the LLM model
cleaned_test_data.tsv – Test dataset for the LLM model
cleaned_train_data_ml.tsv – Training dataset for the ML model
cleaned_test_data_ml.tsv – Test dataset for the ML model
TWNERTC_TC_Fine_Grained... – Original dataset used during preprocessing

`models/`

Contains the scripts that run the classification models.

llm_model.py – Executes the LLM-based text classification pipeline
ml_model.py – Executes the machine learning classification pipeline

Each script can be executed independently.

`preprocessing/`

Contains dataset preprocessing scripts.

dataset.py – Preprocessing pipeline for the LLM dataset
dataset_ml.py – Preprocessing pipeline for the ML dataset

`results/`

Stores the outputs generated by the models.

results/
├── llm/
└── ml/

llm/ – Output generated by the LLM model
ml/ – Output generated by the ML model

Running the Project

Prerequisites

Make sure the following is installed:

Docker

Check installation:

docker --version

Build the Docker Image

From the root directory of the project, run:

docker build -t hepsiburadacasestudy .

This builds the Docker image and installs all dependencies from requirements.txt.

Running the Models

Running the LLM Model

To execute the LLM-based model inside Docker:

docker run -v $(pwd)/data:/app/data -v $(pwd)/results/llm:/app/results/llm hepsiburadacasestudy python models/llm_model.py

Running the ML Model

To execute the Machine Learning model:

docker run -v $(pwd)/data:/app/data -v $(pwd)/results/ml:/app/results/ml hepsiburadacasestudy python models/ml_model.py

This command:

Mounts the dataset directory
Saves results to the local results folder

Docker Volume Mounts Explained

Mount	Description
`-v $(pwd)/data:/app/data`	Makes datasets accessible inside the container
`-v $(pwd)/results/llm:/app/results/llm`	Saves LLM outputs to the local machine
`-v $(pwd)/results/ml:/app/results/ml`	Saves ML outputs to the local machine

Dependencies

All Python dependencies are listed in:

requirements.txt

They are automatically installed during the Docker build process.

Key libraries include:

scikit-learn
pandas
numpy
langchain
huggingface libraries
ollama integration tools

Future Improvements

Add Deep Learning models (BERT / RoBERTa)
Expand evaluation metrics
Implement automated experiment tracking
Add visualization dashboards

License

This project is intended for research and educational purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Classification Using LLM and ML/DL Techniques

Project Overview

Project Highlights

Technologies Used

Project Structure

Folder Description

`data/`

`models/`

`preprocessing/`

`results/`

Running the Project

Prerequisites

Build the Docker Image

Running the Models

Running the LLM Model

Running the ML Model

Docker Volume Mounts Explained

Dependencies

Future Improvements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
TWNERTC_All_Versions		TWNERTC_All_Versions
data		data
models		models
preprocessing		preprocessing
results		results
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
ab_testdesign_aycaelifaktas .pdf		ab_testdesign_aycaelifaktas .pdf
requirements.txt		requirements.txt
text_classification_aycaelifaktas.pdf		text_classification_aycaelifaktas.pdf

Folders and files

Latest commit

History

Repository files navigation

Text Classification Using LLM and ML/DL Techniques

Project Overview

Project Highlights

Technologies Used

Project Structure

Folder Description

data/

models/

preprocessing/

results/

Running the Project

Prerequisites

Build the Docker Image

Running the Models

Running the LLM Model

Running the ML Model

Docker Volume Mounts Explained

Dependencies

Future Improvements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`data/`

`models/`

`preprocessing/`

`results/`

Packages