A high-performance benchmarking suite designed to evaluate and compare data processing efficiency across CPU (Serial, Parallel, Block-wise) and GPU architectures.
The project focuses on identifying the break-even point where computational gains outweigh memory transfer overheads.
parallel_data_processing_engine/
├── run_experiment.py # Main entry point for benchmarking
├── results/ # Auto-generated experiment outputs
└── src/
├── cpu/ # CPU-based implementations
│ ├── __init__.py
│ ├── dataset.py # Data loading and synthetic generation
│ ├── serial_cpu.py # Single-threaded implementation
│ ├── parallel_cpu.py # Multi-threaded implementation
│ ├── block_cpu.py # Cache-optimized block computation
│ ├── benchmark.py # CPU performance metrics
│ └── visualize.py # Plotting and results analysis
│
└── gpu/ # GPU-based implementations (WIP)
└── __init__.py # Future CUDA/OpenCL implementations
The engine uses a CLI-based approach to toggle between different optimization levels and BLAS (Basic Linear Algebra Subprograms) configurations.
| Configuration | Command |
|---|---|
| Baseline (Single-thread BLAS) | python run_experiment.py --mode cpu --version baseline --blas single |
| Optimized (Single-thread BLAS) | python run_experiment.py --mode cpu --version optimized --blas single |
| Baseline (Multi-thread BLAS) | python run_experiment.py --mode cpu --version baseline --blas multi |
| Optimized (Multi-thread BLAS) | python run_experiment.py --mode cpu --version optimized --blas multi |
Note:
GPU: Sipra is working on it
Current GPU script is in GPU Parallel Processing Module
In case of GPU, if a single code structure is implemented as CPU, do the following:
python run_experiment.py --mode gpuAll results are automatically timestamped and saved in the
results/directory.
(For now only the CPU execution part is pipelined that way)
The initial benchmarking was performed on randomly generated dataset. We will use temperature data over a decade from a large number of locations (N) obtained from NASA POWER API. This is to ensure that the implementation works on actual datasets too.
| Contributor | Project Phase |
|---|---|
| Amitava | CPU Implementation (Single/Multi-thread) |
| Block-wise Computation (Shared) | |
| Theoritical Complexity Analysis ($O(N^2)$) for CPU | |
| Sipra | GPU Implementation |
| Block-wise Computation (Shared) | |
| Bhavini | Numerical Consistency Check (CPU vs GPU) |
| Overhead vs Computation Equilibrium | |
| Complexity Analysis ($O(N^2)$) | |
| System Profiling (CPU/GPU/RAM bottlenecks) | |
| Code Revision & Optimization | |
| Experiment using real world data Execution & Plotting (Shared) | |
| Yashvita | Bandwidth & Data Transfer Analysis |
| Transfer Overhead Optimization | |
| Try out the implementation on climate dataset |