This project is a Machine Learning-based application that predicts a movie's gross revenue based on budget, number of votes, runtime, and other important factors.
The project applies Linear Regression, Polynomial Regression, Random Forest, and XGBoost models to find the best predictions. We use GridSearchCV to fine-tune hyperparameters and deploy an interactive Streamlit web app for real-time predictions.
β Data Analysis & Visualization β Correlation heatmaps, bar plots, scatter plots, etc.
β Machine Learning Models β Linear Regression, Polynomial Regression, Random Forest, and XGBoost.
β Hyperparameter Optimization β Uses GridSearchCV for the best model settings.
β Feature Importance Analysis β Identifies the most important factors affecting revenue.
β Interactive Web App β A Streamlit-based UI where users can enter movie details and get predictions.
β Model Persistence β Saves the best-trained model for future use.
Movie_Revenue_Prediction/
βββ π data/ # Contains raw dataset
β βββ movies.csv # Raw movie dataset
β
βββ π models/ # Contains scripts for ML training
β βββ analysis_and_charts.py # Data analysis & visualization (correlation heatmaps, graphs)
β βββ model.py # Machine Learning model implementation
β βββ trained_model.py # Script for model training & tuning
β
βββ π app.py # Streamlit web app for predictions
βββ π requirements.txt # Dependencies needed for the project
βββ π README.md # Documentation
- Linear Regression π
- Polynomial Regression π΅
- Random Forest π³
- XGBoost β‘ (Best performing model)
We applied hyperparameter tuning using GridSearchCV to optimize performance.
git clone https://github.com/Ra638/Movie_Revenue_Prediction.git
cd Movie_Revenue_Prediction
pip install -r requirements.txt
python models/analysis_and_charts.py
python models/trained_model.py
python models/trained_model.pyModel MAE MSE RΒ² Score
---------------------------------------------------------
Linear Regression 70M 1.77e+16 0.50
Polynomial Regression (Degree 2) 49M 1.16e+16 0.67
Polynomial Regression (Degree 3) 49M 1.16e+16 0.67
Random Forest 49M 1.09e+16 0.69
XGBoost (Tuned) 49M 9.69e+15 0.77
β XGBoost performs the best with RΒ² = 0.77!
The project includes a Streamlit Web App where users can enter details and get predictions.
π Enhance Feature Engineering β Add more relevant features π Try Deep Learning Models β Test Neural Networks for better accuracy π Deploy Online β Host the app on AWS/GCP/Heroku
Want to improve this project? Feel free to fork and submit a Pull Request π―