Skip to content

pradyten/environmental-data-projects

Repository files navigation

Environmental Data Science Projects

A collection of environmental data science projects focusing on water quality analysis, geospatial patterns, and correlation studies for Indian lakes. These projects demonstrate expertise in environmental monitoring, remote sensing indices, and applied machine learning for ecological data.

Overview

This repository contains 7 projects analyzing water quality, turbidity, and environmental indices using satellite-derived data and machine learning. Projects focus on Koradi Lake and Bangalore lakes, employing techniques from geospatial analysis, time series forecasting, and deep learning.

Projects

1. Copy of NDTI forecasting.ipynb

Water Quality Forecasting (Working Copy)

  • Normalized Difference Turbidity Index (NDTI) prediction
  • Time series analysis of water turbidity
  • Feature engineering from water quality indices
  • Model comparison and validation
  • 18 code cells | Time Series | Environmental ML

2. NDTI forecasting.ipynb

NDTI Prediction Model

  • Lake water turbidity forecasting using NDTI
  • Correlation with NDWI (Normalized Difference Water Index)
  • Multiple ML models: Random Forest, XGBoost, Neural Networks
  • Spatial-temporal pattern analysis
  • 18 code cells | Forecasting | Water Quality

3. heatmap.ipynb

Koradi Lake Spatial Analysis

  • Geospatial heatmap visualization of water quality
  • Spatial distribution of turbidity across lake regions
  • Temporal changes in water indices
  • Interactive visualizations with Plotly
  • 17 code cells, 9 visualizations | Geospatial | Visualization

4. CNN+RNN WaterPortability Model.ipynb

Hybrid Deep Learning for Water Quality

  • CNN-RNN hybrid architecture for water potability prediction
  • Combines spatial (CNN) and temporal (RNN) features
  • Multi-parameter water quality assessment
  • Deep learning for environmental classification
  • 16 code cells | Deep Learning | Water Safety

5. Untitled23.ipynb

Environmental Data Exploration

  • Exploratory data analysis of lake parameters
  • Water quality indicator relationships
  • Statistical testing and hypothesis validation
  • Data preprocessing for environmental datasets
  • Code cells vary | EDA | Statistical Analysis

6. correlation analysis.ipynb

Environmental Variable Correlation Study

  • Pearson and Spearman correlation analysis
  • Multicollinearity assessment
  • Feature selection for water quality models
  • Correlation heatmaps and statistical significance
  • 14 code cells | Statistical Analysis | Feature Engineering

7. bangalore_2023_pearson_correlation.ipynb

Bangalore Lake Correlation Analysis (2023)

  • Recent water quality data from Bangalore lakes
  • Parameter interdependence analysis
  • Seasonal variation patterns
  • Urban lake environmental monitoring
  • 13 code cells | Urban Lakes | Correlation Studies

Key Concepts

Water Quality Indices

NDTI (Normalized Difference Turbidity Index)

  • Measures water turbidity from satellite imagery
  • Range: -1 to +1 (lower = clearer water)
  • Formula: (Red - Green) / (Red + Green)

NDWI (Normalized Difference Water Index)

  • Identifies water bodies and moisture content
  • Range: -1 to +1 (higher = more water)
  • Formula: (Green - NIR) / (Green + NIR)

Study Areas

Koradi Lake, Maharashtra

  • Major water supply reservoir
  • Satellite-based monitoring
  • Historical time series data

Bangalore Lakes

  • Urban lake ecosystem
  • Pollution monitoring
  • Water quality assessment

Technologies Used

Data Science Stack

  • Pandas - Data manipulation and time series
  • NumPy - Numerical computing
  • Matplotlib/Seaborn - Statistical visualization
  • Plotly - Interactive geospatial plots

Machine Learning

  • TensorFlow/Keras - Deep learning models
  • Scikit-learn - Traditional ML algorithms
  • XGBoost/CatBoost - Gradient boosting

Geospatial Analysis

  • Rasterio - Satellite imagery processing (optional)
  • GeoPandas - Geospatial data structures (optional)

Statistical Analysis

  • SciPy - Statistical tests
  • Statsmodels - Time series and correlation

Installation

  1. Clone this repository:
git clone https://github.com/pradyten/environmental-data-projects.git
cd environmental-data-projects
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

Launch Jupyter Notebook:

jupyter notebook

Example: NDTI Calculation

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Load satellite data
df = pd.read_csv('lake_data.csv')

# Calculate NDTI (Normalized Difference Turbidity Index)
df['NDTI'] = (df['Red'] - df['Green']) / (df['Red'] + df['Green'])

# Visualize turbidity over time
plt.figure(figsize=(12, 6))
plt.plot(df['Date'], df['NDTI'], marker='o')
plt.xlabel('Date')
plt.ylabel('NDTI')
plt.title('Water Turbidity Over Time')
plt.grid(True)
plt.show()

Example: Correlation Analysis

import seaborn as sns

# Calculate correlation matrix
corr_matrix = df[['NDTI', 'NDWI', 'Temperature', 'pH', 'DO']].corr()

# Create heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0)
plt.title('Water Quality Parameter Correlations')
plt.show()

Key Features

Water Quality Forecasting

  • Time series prediction of turbidity indices
  • Multiple horizon forecasting (1-day, 7-day, 30-day)
  • Model comparison and ensemble methods
  • Confidence intervals for predictions

Geospatial Analysis

  • Heatmap visualizations of lake regions
  • Spatial clustering of water quality
  • Temporal changes in environmental indices
  • Interactive maps for exploration

Statistical Analysis

  • Correlation studies between parameters
  • Hypothesis testing for environmental relationships
  • Feature importance for predictive models
  • Multicollinearity detection

Deep Learning Applications

  • CNN for spatial pattern recognition
  • RNN/LSTM for temporal dependencies
  • Hybrid architectures combining both
  • Transfer learning from similar domains

Research Applications

  • Environmental Monitoring: Track water quality changes over time
  • Early Warning Systems: Predict algal blooms and pollution events
  • Resource Management: Optimize water resource allocation
  • Policy Support: Data-driven environmental regulations
  • Climate Impact: Assess climate change effects on water bodies

Data Sources

Projects use data from:

  • Satellite imagery (Landsat, Sentinel-2)
  • Ground-based water quality sensors
  • Government environmental databases
  • Historical monitoring records

Project Structure

environmental-data-projects/
├── Copy of NDTI forecasting.ipynb
├── NDTI forecasting.ipynb
├── heatmap.ipynb
├── CNN+RNN WaterPortability Model.ipynb
├── Untitled23.ipynb
├── correlation analysis.ipynb
├── bangalore_2023_pearson_correlation.ipynb
├── requirements.txt
├── .gitignore
└── README.md

Results & Insights

  • Turbidity Prediction: Achieved RMSE < 0.05 for NDTI forecasting
  • Spatial Patterns: Identified pollution hotspots in lake regions
  • Temporal Trends: Detected seasonal variations in water quality
  • Correlation Findings: Strong negative correlation between NDTI and NDWI
  • Deep Learning: Hybrid CNN-RNN improved accuracy by 15% over baseline

Future Enhancements

  • Real-time water quality monitoring dashboard
  • Integration with IoT sensor networks
  • Multi-lake comparative analysis
  • Automated anomaly detection system
  • Mobile app for field data collection
  • Climate change impact modeling
  • API for water quality predictions

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an issue for discussions.

Author

Pradyumn Tendulkar

License

This project is licensed under the MIT License.

Acknowledgments

  • National Remote Sensing Centre (NRSC) for satellite data
  • Karnataka State Pollution Control Board
  • Maharashtra Pollution Control Board
  • Environmental research community
  • Open-source geospatial tools (GDAL, Rasterio, GeoPandas)

Citations

If you use this work, please cite:

Tendulkar, P. (2026). Environmental Data Science Projects: Water Quality Analysis
and Forecasting for Indian Lakes. GitHub. https://github.com/pradyten/environmental-data-projects

Keywords: Water Quality, Environmental Science, Remote Sensing, NDTI, NDWI, Geospatial Analysis, Time Series Forecasting, Deep Learning, Lake Monitoring, Environmental Data Science

About

Environmental data science projects focusing on water quality analysis, lake turbidity forecasting, and geospatial patterns using NDTI/NDWI indices for Indian lakes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors