Document CNN architecture compatibility#15
Merged
Conversation
This commit establishes the foundation for supporting Convolutional Neural Networks (CNNs) through a pluggable architecture pattern while maintaining full backward compatibility with existing ANN/MLP code. ## New Components ### 1. Tensor<T> Class (tensor.h) - N-dimensional array support (1D, 2D, 3D, 4D, etc.) - Shape manipulation: reshape, transpose, flatten, squeeze, unsqueeze - Element-wise operations: +, -, *, / (tensor-tensor and tensor-scalar) - Interoperability with ml::Mat<T> (fromMat, toMat) - Memory efficient using std::shared_ptr - Factory methods: zeros, ones, random, randn - Statistics: sum, mean, max, min - Comprehensive test coverage (9/9 tests passing) ### 2. im2col/col2im Utilities (im2col.h) - Transform convolution to matrix multiplication (industry standard) - im2col: Extract image patches into column matrix - col2im: Inverse operation for backpropagation - Supports arbitrary kernel size, stride, and padding - Batch processing support - Helper functions for dimension calculation and gradients - Comprehensive test coverage (9/9 tests passing) ### 3. Documentation - ARCHITECTURE_DESIGN.md: Comprehensive design document (800+ lines) * Pluggable architecture pattern explanation * Design decisions and rationale * CNN vs ANN comparison (mathematical and structural) * Implementation phases and roadmap * API examples and usage patterns - CNN_COMPATIBILITY.md: User-facing compatibility guide (600+ lines) * Core differences between ANN and CNN * Why CNNs for image data (parameter efficiency) * Mathematical operations comparison * im2col algorithm explanation with examples * Migration path for existing code * Performance comparison - IMPLEMENTATION_PROGRESS.md: Status tracking * Completed components * Testing strategy * Next steps and timeline * Quality metrics ## Design Principles 1. **Dependency Injection for Neural Architectures** - Different layer types (Dense, Conv, Pool) as injectable components - All conform to ILayer<T> interface 2. **Backward Compatibility** - Tensor<T> added alongside Mat<T> (non-breaking) - Existing Layer<T> will become alias to DenseLayer<T> - All existing code continues to work 3. **Open/Closed Principle** - Open for extension (new layer types) - Closed for modification (core Network unchanged) ## Test Results All tests passing: - test_tensor.cpp: 9/9 tests ✓ - test_im2col.cpp: 9/9 tests ✓ ## Next Steps 1. Implement Conv2D layer with forward/backward pass 2. Implement MaxPool2D and AvgPool2D layers 3. Refactor Layer<T> to DenseLayer<T> 4. Create CNN MNIST example (LeNet-5) 5. Validate >95% accuracy on MNIST ## Technical Details - im2col approach: 2× memory overhead, but simple and maintainable - Leverages existing Mat<T> operations and OpenMP parallelization - Compatible with existing optimizer infrastructure - Ready for future RNN support (3D tensors) Estimated time to working CNN: ~6 hours from this point
Implements a fully-functional 2D convolutional layer for CNN support.
## Conv2D Layer (conv_layer.h)
### Features
- Forward pass using im2col for efficient matrix multiplication
- Backward pass with gradient computation for:
* Input gradients (for previous layer)
* Kernel/filter gradients
* Bias gradients
- Multiple activation functions: ReLU, Sigmoid, Tanh, Linear, LeakyReLU, ELU
- Configurable hyperparameters:
* Kernel size (height, width)
* Stride (vertical, horizontal)
* Padding (vertical, horizontal)
- He initialization for ReLU activations
- Xavier initialization for other activations
- Support for:
* Multiple input/output channels
* Batch processing
* Arbitrary input dimensions
### Implementation Details
- Uses im2col to transform convolution into matrix multiplication
- Caches intermediate values for efficient backpropagation
- Gradient updates via simple SGD (can be extended to other optimizers)
- Proper shape tracking and validation
## Testing (test_conv_layer.cpp)
### Test Coverage (11/11 passing)
1. Construction with parameter validation
2. Weight initialization (He/Xavier)
3. Forward pass (basic 2x2 kernel)
4. Forward pass with padding
5. Forward pass with stride > 1
6. Multi-channel input/output
7. Batch processing (batch > 1)
8. Different activation functions
9. **Numerical gradient checking** (relative error < 0.0002!)
10. Weight updates
11. MNIST-like dimensions (28x28 input)
### Gradient Verification
Numerical vs analytical gradients match with relative error: 0.000162
This confirms backpropagation is implemented correctly.
## Example Usage
```cpp
// Create Conv2D layer: 32 filters, 5x5 kernel, ReLU activation
Conv2D<float> conv(32, 5, 5, ActivationType::RELU);
conv.setInputChannels(1); // Grayscale input
conv.init();
// Forward pass: [batch, channels, height, width]
Tensor<float> input({8, 1, 28, 28}); // 8 images, 28x28
auto output = conv.forward(input); // → [8, 32, 24, 24]
// Backward pass
Tensor<float> d_output = ...; // Gradient from next layer
auto d_input = conv.backward(d_output);
// Update weights
conv.updateWeights(0.01); // Learning rate = 0.01
```
## Performance
- Leverages existing OpenMP parallelization in matrix multiplication
- im2col approach is industry-standard (used by Caffe, PyTorch, TensorFlow)
- Memory overhead: ~2× input size (acceptable trade-off for correctness)
## Next Steps
- MaxPool2D and AvgPool2D layers (simpler, no learnable parameters)
- Integration with existing Network<T> class
- CNN MNIST example to validate end-to-end
Implements pooling layers for CNN spatial downsampling.
## Pooling Layers (pooling_layer.h)
### MaxPool2D
- Takes maximum value within each pooling window
- Provides translation invariance
- Reduces spatial dimensions while preserving important features
- Backpropagation routes gradients only to max positions
- Configurable pool size and stride
### AvgPool2D
- Takes average value within each pooling window
- Smoother downsampling compared to max pooling
- Distributes gradients evenly during backpropagation
- Useful for certain architectures
### GlobalAvgPool2D
- Pools over entire spatial dimensions
- Reduces [batch, channels, height, width] → [batch, channels, 1, 1]
- Common alternative to fully-connected layers before classification
- Reduces parameter count in final layers
## Key Features
- No learnable parameters (pooling is a fixed operation)
- Preserves number of channels
- Supports overlapping and non-overlapping windows
- Batch processing support
- Efficient gradient routing during backpropagation
## Testing (test_pooling_layer.cpp)
### Test Coverage (11/11 passing)
1. MaxPool construction
2. MaxPool forward pass (basic)
3. MaxPool backward pass (gradient routing to max positions)
4. MaxPool with overlapping windows
5. MaxPool with multiple channels
6. MaxPool batch processing
7. AvgPool forward pass (basic)
8. AvgPool backward pass (gradient distribution)
9. GlobalAvgPool functionality
10. Channel preservation across pooling types
11. MNIST-like dimensions (24x24 → 12x12)
## Example Usage
```cpp
// Max pooling: 2x2 window, stride=2 (non-overlapping)
MaxPool2D<float> maxpool(2, 2, 2, 2);
Tensor<float> input({8, 32, 24, 24});
auto output = maxpool.forward(input); // → [8, 32, 12, 12]
// Backward pass
Tensor<float> d_output = ...;
auto d_input = maxpool.backward(d_output);
// Average pooling
AvgPool2D<float> avgpool(2, 2);
auto avg_output = avgpool.forward(input);
// Global average pooling (for classification)
GlobalAvgPool2D<float> gap;
Tensor<float> features({8, 512, 7, 7});
auto global_features = gap.forward(features); // → [8, 512, 1, 1]
```
## Design Notes
- MaxPool stores indices of max values for efficient backpropagation
- AvgPool distributes gradients evenly (1/pool_size to each position)
- Both preserve the number of channels (only spatial downsampling)
- GlobalAvgPool is commonly used in modern CNN architectures
## Performance
- Pooling is computationally cheap (no matrix multiplications)
- Max pooling: O(pool_size² × output_size)
- Backward pass is equally efficient
- No memory overhead (no learnable parameters)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.