This project focuses on analyzing and forecasting sales data using time series techniques and machine learning (XGBoost). The analysis includes exploratory visualizations, trend analysis, model training, future forecasting, and preparing outputs for integration with Power BI.
The dataset is loaded from:
/content/drive/MyDrive/mock_kaggle.csv
data: Date of the record (YYYY-MM-DD format)venda: Sales value (target variable)estoque: Inventory or stock levelspreco: Price of the product
- Convert
datato datetime. - Remove rows with
venda <= 0. - Create time-based features:
month,year,weekday. - Create a 7-day rolling average of sales as a new feature
rolling_avg.
- Time-series plot of daily sales.
- Monthly and yearly sales trend plots using bar charts.
-
Train an
XGBoost Regressorusing:- Features:
month,year,weekday,estoque,preco,rolling_avg - Target:
venda
- Features:
-
Split data into train/test using a time-aware split (no shuffle).
-
Standardize features using
StandardScaler.
- Calculate RMSE and MAE on test data.
- Plot Actual vs Predicted sales.
- Predict sales for the next 30 days using the last known values for features.
- Save future predictions for visualization or reporting.
-
Export data files for reporting:
- Daily actual and predicted sales
- Monthly sales summary
- Yearly sales summary
- KPI metrics (RMSE, MAE, total sales)
| File Name | Description |
|---|---|
forecast_data.csv |
Daily actual vs predicted sales (for Power BI) |
monthly_sales.csv |
Monthly total sales |
yearly_sales.csv |
Yearly total sales |
kpi_data.csv |
Key performance indicators (RMSE, MAE, Total) |
Evaluated using test data:
- RMSE: Root Mean Squared Error
- MAE: Mean Absolute Error
- Total Sales: Aggregate of all valid sales data
- Sales Trend Over Time
- Monthly and Yearly Bar Charts
- Actual vs Predicted Sales Line Plot
Make sure the following Python packages are installed:
pip install pandas matplotlib seaborn xgboost scikit-learn- Be sure your dataset includes the required columns and correct formats.
- This project assumes you are running in a Jupyter Notebook or Google Colab environment with access to files in
/content/.
This project was developed for sales forecasting and business intelligence reporting. It can be easily extended with holidays, promotions, or external economic data to improve predictions.
Feel free to use, modify, and share!