Parallel Data Processing Engine

A high-performance benchmarking suite designed to evaluate and compare data processing efficiency across CPU (Serial, Parallel, Block-wise) and GPU architectures.
The project focuses on identifying the break-even point where computational gains outweigh memory transfer overheads.

📂 Repository Structure

parallel_data_processing_engine/
├── run_experiment.py         # Main entry point for benchmarking
├── results/                  # Auto-generated experiment outputs
└── src/
    ├── cpu/                  # CPU-based implementations
    │   ├── __init__.py
    │   ├── dataset.py        # Data loading and synthetic generation
    │   ├── serial_cpu.py     # Single-threaded implementation
    │   ├── parallel_cpu.py   # Multi-threaded implementation
    │   ├── block_cpu.py      # Cache-optimized block computation
    │   ├── benchmark.py      # CPU performance metrics
    │   └── visualize.py      # Plotting and results analysis
    │
    └── gpu/                  # GPU-based implementations (WIP)
        └── __init__.py       # Future CUDA/OpenCL implementations

⚡ Running Experiments

The engine uses a CLI-based approach to toggle between different optimization levels and BLAS (Basic Linear Algebra Subprograms) configurations.

CPU Benchmarks

Configuration	Command
Baseline (Single-thread BLAS)	`python run_experiment.py --mode cpu --version baseline --blas single`
Optimized (Single-thread BLAS)	`python run_experiment.py --mode cpu --version optimized --blas single`
Baseline (Multi-thread BLAS)	`python run_experiment.py --mode cpu --version baseline --blas multi`
Optimized (Multi-thread BLAS)	`python run_experiment.py --mode cpu --version optimized --blas multi`

GPU Benchmarks (Planned)

Note:
GPU: Sipra is working on it
Current GPU script is in GPU Parallel Processing Module
In case of GPU, if a single code structure is implemented as CPU, do the following:

python run_experiment.py --mode gpu

All results are automatically timestamped and saved in the results/ directory.
(For now only the CPU execution part is pipelined that way)

Testing the implementation on an actual dataset

The initial benchmarking was performed on randomly generated dataset. We will use temperature data over a decade from a large number of locations (N) obtained from NASA POWER API. This is to ensure that the implementation works on actual datasets too.

🛠 Project Workflow & Responsibilities

Contributor	Project Phase
Amitava	CPU Implementation (Single/Multi-thread)
	Block-wise Computation (Shared)
	Theoritical Complexity Analysis ($O(N^2)$) for CPU
Sipra	GPU Implementation
	Block-wise Computation (Shared)
Bhavini	Numerical Consistency Check (CPU vs GPU)
	Overhead vs Computation Equilibrium
	Complexity Analysis ($O(N^2)$)
	System Profiling (CPU/GPU/RAM bottlenecks)
	Code Revision & Optimization
	Experiment using real world data Execution & Plotting (Shared)
Yashvita	Bandwidth & Data Transfer Analysis
	Transfer Overhead Optimization
	Try out the implementation on climate dataset

Presentation

View the Presentation

Name		Name	Last commit message	Last commit date
Latest commit History 124 Commits
.ipynb_checkpoints		.ipynb_checkpoints
CPU_comments		CPU_comments
GPU_parallelproce		GPU_parallelproce
Project_descriptions		Project_descriptions
results		results
src		src
test_cases		test_cases
Bandwidth.ipynb		Bandwidth.ipynb
GPU.ipynb		GPU.ipynb
GPU_CPU.docx		GPU_CPU.docx
GPU_versions.ipynb		GPU_versions.ipynb
LICENSE		LICENSE
Presentation_link		Presentation_link
README.md		README.md
run_experiment.py		run_experiment.py
timing_comparison.py		timing_comparison.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Data Processing Engine

📂 Repository Structure

⚡ Running Experiments

CPU Benchmarks

GPU Benchmarks (Planned)

Testing the implementation on an actual dataset

🛠 Project Workflow & Responsibilities

Presentation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Data Processing Engine

📂 Repository Structure

⚡ Running Experiments

CPU Benchmarks

GPU Benchmarks (Planned)

Testing the implementation on an actual dataset

🛠 Project Workflow & Responsibilities

Presentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages