A collection of environmental data science projects focusing on water quality analysis, geospatial patterns, and correlation studies for Indian lakes. These projects demonstrate expertise in environmental monitoring, remote sensing indices, and applied machine learning for ecological data.
This repository contains 7 projects analyzing water quality, turbidity, and environmental indices using satellite-derived data and machine learning. Projects focus on Koradi Lake and Bangalore lakes, employing techniques from geospatial analysis, time series forecasting, and deep learning.
Water Quality Forecasting (Working Copy)
- Normalized Difference Turbidity Index (NDTI) prediction
- Time series analysis of water turbidity
- Feature engineering from water quality indices
- Model comparison and validation
- 18 code cells | Time Series | Environmental ML
NDTI Prediction Model
- Lake water turbidity forecasting using NDTI
- Correlation with NDWI (Normalized Difference Water Index)
- Multiple ML models: Random Forest, XGBoost, Neural Networks
- Spatial-temporal pattern analysis
- 18 code cells | Forecasting | Water Quality
Koradi Lake Spatial Analysis
- Geospatial heatmap visualization of water quality
- Spatial distribution of turbidity across lake regions
- Temporal changes in water indices
- Interactive visualizations with Plotly
- 17 code cells, 9 visualizations | Geospatial | Visualization
Hybrid Deep Learning for Water Quality
- CNN-RNN hybrid architecture for water potability prediction
- Combines spatial (CNN) and temporal (RNN) features
- Multi-parameter water quality assessment
- Deep learning for environmental classification
- 16 code cells | Deep Learning | Water Safety
Environmental Data Exploration
- Exploratory data analysis of lake parameters
- Water quality indicator relationships
- Statistical testing and hypothesis validation
- Data preprocessing for environmental datasets
- Code cells vary | EDA | Statistical Analysis
Environmental Variable Correlation Study
- Pearson and Spearman correlation analysis
- Multicollinearity assessment
- Feature selection for water quality models
- Correlation heatmaps and statistical significance
- 14 code cells | Statistical Analysis | Feature Engineering
Bangalore Lake Correlation Analysis (2023)
- Recent water quality data from Bangalore lakes
- Parameter interdependence analysis
- Seasonal variation patterns
- Urban lake environmental monitoring
- 13 code cells | Urban Lakes | Correlation Studies
NDTI (Normalized Difference Turbidity Index)
- Measures water turbidity from satellite imagery
- Range: -1 to +1 (lower = clearer water)
- Formula: (Red - Green) / (Red + Green)
NDWI (Normalized Difference Water Index)
- Identifies water bodies and moisture content
- Range: -1 to +1 (higher = more water)
- Formula: (Green - NIR) / (Green + NIR)
Koradi Lake, Maharashtra
- Major water supply reservoir
- Satellite-based monitoring
- Historical time series data
Bangalore Lakes
- Urban lake ecosystem
- Pollution monitoring
- Water quality assessment
- Pandas - Data manipulation and time series
- NumPy - Numerical computing
- Matplotlib/Seaborn - Statistical visualization
- Plotly - Interactive geospatial plots
- TensorFlow/Keras - Deep learning models
- Scikit-learn - Traditional ML algorithms
- XGBoost/CatBoost - Gradient boosting
- Rasterio - Satellite imagery processing (optional)
- GeoPandas - Geospatial data structures (optional)
- SciPy - Statistical tests
- Statsmodels - Time series and correlation
- Clone this repository:
git clone https://github.com/pradyten/environmental-data-projects.git
cd environmental-data-projects- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtLaunch Jupyter Notebook:
jupyter notebookimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Load satellite data
df = pd.read_csv('lake_data.csv')
# Calculate NDTI (Normalized Difference Turbidity Index)
df['NDTI'] = (df['Red'] - df['Green']) / (df['Red'] + df['Green'])
# Visualize turbidity over time
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['NDTI'], marker='o')
plt.xlabel('Date')
plt.ylabel('NDTI')
plt.title('Water Turbidity Over Time')
plt.grid(True)
plt.show()import seaborn as sns
# Calculate correlation matrix
corr_matrix = df[['NDTI', 'NDWI', 'Temperature', 'pH', 'DO']].corr()
# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Water Quality Parameter Correlations')
plt.show()- Time series prediction of turbidity indices
- Multiple horizon forecasting (1-day, 7-day, 30-day)
- Model comparison and ensemble methods
- Confidence intervals for predictions
- Heatmap visualizations of lake regions
- Spatial clustering of water quality
- Temporal changes in environmental indices
- Interactive maps for exploration
- Correlation studies between parameters
- Hypothesis testing for environmental relationships
- Feature importance for predictive models
- Multicollinearity detection
- CNN for spatial pattern recognition
- RNN/LSTM for temporal dependencies
- Hybrid architectures combining both
- Transfer learning from similar domains
- Environmental Monitoring: Track water quality changes over time
- Early Warning Systems: Predict algal blooms and pollution events
- Resource Management: Optimize water resource allocation
- Policy Support: Data-driven environmental regulations
- Climate Impact: Assess climate change effects on water bodies
Projects use data from:
- Satellite imagery (Landsat, Sentinel-2)
- Ground-based water quality sensors
- Government environmental databases
- Historical monitoring records
environmental-data-projects/
├── Copy of NDTI forecasting.ipynb
├── NDTI forecasting.ipynb
├── heatmap.ipynb
├── CNN+RNN WaterPortability Model.ipynb
├── Untitled23.ipynb
├── correlation analysis.ipynb
├── bangalore_2023_pearson_correlation.ipynb
├── requirements.txt
├── .gitignore
└── README.md
- Turbidity Prediction: Achieved RMSE < 0.05 for NDTI forecasting
- Spatial Patterns: Identified pollution hotspots in lake regions
- Temporal Trends: Detected seasonal variations in water quality
- Correlation Findings: Strong negative correlation between NDTI and NDWI
- Deep Learning: Hybrid CNN-RNN improved accuracy by 15% over baseline
- Real-time water quality monitoring dashboard
- Integration with IoT sensor networks
- Multi-lake comparative analysis
- Automated anomaly detection system
- Mobile app for field data collection
- Climate change impact modeling
- API for water quality predictions
Contributions are welcome! Please feel free to submit a Pull Request or open an issue for discussions.
Pradyumn Tendulkar
- Email: pktendulkar@wpi.edu
- LinkedIn: linkedin.com/in/p-tendulkar
- GitHub: @pradyten
This project is licensed under the MIT License.
- National Remote Sensing Centre (NRSC) for satellite data
- Karnataka State Pollution Control Board
- Maharashtra Pollution Control Board
- Environmental research community
- Open-source geospatial tools (GDAL, Rasterio, GeoPandas)
If you use this work, please cite:
Tendulkar, P. (2026). Environmental Data Science Projects: Water Quality Analysis
and Forecasting for Indian Lakes. GitHub. https://github.com/pradyten/environmental-data-projects
Keywords: Water Quality, Environmental Science, Remote Sensing, NDTI, NDWI, Geospatial Analysis, Time Series Forecasting, Deep Learning, Lake Monitoring, Environmental Data Science