A deep learning-based binary classification system for detecting manufacturing defects in metal casting components, achieving 99.72% accuracy on real-world industrial data.
Model predictions on test set showing high confidence in defect detection
This project automates quality inspection for submersible pump impellers in the casting industry. Manual inspection is time-consuming, expensive, and prone to human error. Our CNN-based solution provides fast, accurate, and consistent defect detection.
Key Achievement: The model correctly identifies 99.17% of defective parts while producing zero false rejections of good parts.
Try the model yourself: https://detectingdefect.streamlit.app/
Upload an image of a casting product to get instant defect detection results.
| Metric | Value |
|---|---|
| Overall Accuracy | 99.44% (356/358) |
| Defect Detection (Recall) | 99.17% (239/241) |
| False Positive Rate | 0% (0 false alarms) |
| Precision (Defective) | 100% |
| Precision (OK) | 98% |
Predicted
Defective OK
Actual Def 239 2 ← Missed only 2 defects
OK 0 117 ← Zero false rejections
Critical Metrics for Manufacturing:
- False Negatives (Missed Defects): 2 out of 241 (0.83%)
- False Positives (False Alarms): 0 out of 117 (0%)
This balance is crucial—we catch 99% of defects while never rejecting good parts.
Source: Kaggle - Real-Life Industrial Casting Dataset
| Split | Defective | OK | Total |
|---|---|---|---|
| Train | 3,758 (56.7%) | 2,875 (43.3%) | 6,633 |
| Validation | ~226 (63.3%) | ~131 (36.7%) | 357 |
| Test | 241 (67.3%) | 117 (32.7%) | 358 |
| Total (Original) | 4,211 | 3,137 | 7,348 |
Note: The original dataset provides train (6,633) and test (715) folders. For proper validation, the test folder was split 50/50 into validation and test sets using train_test_split with random_state=42.
Class Imbalance Analysis:
- The dataset shows mild imbalance (~57% defective, ~43% OK)
- Ratio of 1.3:1 is manageable without special sampling techniques
- Model performance (99.7% accuracy, 100% precision on defects) demonstrates the imbalance doesn't significantly impact learning
- No class weights or oversampling were needed
- Format: 300×300 grayscale images (converted to RGB for model compatibility)
- Subject: Top-view images of submersible pump impellers
- Augmentation: Pre-applied in dataset
- Capture Setup: Controlled lighting environment for consistency
- Defect Types: Blow holes, pinholes, burrs, shrinkage, surface irregularities
Casting defects cause significant losses in manufacturing. Manual inspection:
- Takes excessive time
- Lacks consistency (human fatigue/error)
- Cannot guarantee 100% coverage
- Results in costly batch rejections
Automated inspection solves these problems while maintaining high accuracy.
We chose ResNet-18 with transfer learning for several key reasons:
Traditional deep CNNs suffer from vanishing gradients, making training difficult. ResNet's residual connections allow gradients to flow directly through the network:
Output = F(x) + x (where F(x) is learned residual)
This enables training of much deeper networks without performance degradation.
- Pre-trained on ImageNet (1.2M images, 1000 classes)
- Learns robust low-level features (edges, textures, patterns)
- Requires less training data and time for our specific task
- Achieves excellent performance with only 6,633 training images
- ResNet-18 (11M parameters): Lightweight, fast inference, less prone to overfitting
- vs. ResNet-50 (25M params): Heavier, may overfit on small datasets
- vs. ResNet-152 (60M params): Overkill for binary classification
- Training time: ~40 seconds per epoch (415 batches)
- Inference: Real-time capable for production deployment
- Memory footprint: Suitable for edge devices
ResNet-18 (Pretrained on ImageNet)
├── Conv1 (7×7, 64 filters)
├── MaxPool (3×3)
├── Layer1: [BasicBlock × 2] # 64 channels
├── Layer2: [BasicBlock × 2] # 128 channels
├── Layer3: [BasicBlock × 2] # 256 channels
├── Layer4: [BasicBlock × 2] # 512 channels
├── AdaptiveAvgPool
└── Fully Connected (512 → 2) ← Modified for binary classificationKey Modification:
model = models.resnet18(weights=models.ResNet18_Weights.DEFAULT)
in_features = model.fc.in_features # 512
model.fc = nn.Linear(in_features, 2) # Binary: [defective, ok]Currently using CrossEntropyLoss for flexibility:
criterion = nn.CrossEntropyLoss()Advantages:
- Multi-class ready: Easy to extend from binary to multi-class (e.g., defect types)
- Numerically stable: Combines softmax + negative log likelihood
- Standard practice: Well-tested in production systems
For strictly binary tasks, BCEWithLogitsLoss can be used:
# Alternative for binary-only classification
model.fc = nn.Linear(in_features, 1) # Single output
criterion = nn.BCEWithLogitsLoss()When to switch:
- No plans to expand to multi-class defect types
- Slight memory/computation optimization needed
- Want probability output directly (sigmoid)
Current choice reasoning: We keep CrossEntropyLoss because:
- Future-proofing: May want to classify defect types (scratches vs. holes vs. burrs)
- Minimal overhead for binary case
- Code consistency with multi-class paradigm
| Component | Technology |
|---|---|
| Framework | PyTorch |
| Model | ResNet-18 (Transfer Learning) |
| Pretrained Weights | ImageNet |
| Optimizer | Adam (lr=0.001, weight_decay=1e-4) |
| Loss Function | CrossEntropyLoss |
| Regularization | Early Stopping (patience=10) |
| Data Processing | torchvision transforms |
| Visualization | matplotlib, seaborn |
| Evaluation | scikit-learn metrics |
Batch Size: 16 (train), 8 (validation)
Optimizer: Adam (lr=0.001, weight_decay=1e-4)
Epochs: 100 (max)
Early Stopping: Patience=10 epochs
Normalization: ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])The model converged in 52 epochs (early stopping triggered):
| Epoch | Train Loss | Val Loss | Train Acc | Val Acc | Status |
|---|---|---|---|---|---|
| 4 | 0.0432 | 0.0121 | 98.45% | 99.72% | Best |
| 10 | 0.0251 | 0.0077 | 99.31% | 100% | Saved |
| 18 | 0.0186 | 0.0042 | 99.40% | 99.72% | Saved |
| 25 | 0.0185 | 0.0031 | 99.41% | 100% | Saved |
| 32 | 0.0150 | 0.0016 | 99.44% | 100% | Saved |
| 34 | 0.0126 | 0.0013 | 99.52% | 100% | Best |
| 42 | 0.0100 | 0.0008 | 99.64% | 100% | Best |
| 43-52 | ... | 0.008-0.048 | 99.4-99.8% | 98.9-100% | No improvement |
Key Observations:
- Rapid convergence in first 10 epochs
- Best validation loss: 0.0008 at epoch 42
- Early stopping prevented overfitting (patience=10)
- Final model achieves near-perfect validation accuracy
Dataset Structure: The original dataset provides train (6,633 images) and test (715 images) folders. To enable proper model validation, the test folder was split 50/50 into validation (357 images) and final test (358 images) sets.
Three-Way Data Split:
- Training set (6,633 images): Used to update model weights
- Validation set (357 images): Used for early stopping during training
- Test set (358 images): Held-out data NEVER seen during training, used only for final evaluation
Evidence Against Overfitting:
-
Separate Test Set: The 99.44% accuracy is measured on a completely held-out test set that the model never saw during training or validation. This proves genuine generalization capability.
-
Small Train-Validation Gap:
- Training accuracy: 99.64%
- Validation accuracy: 100%
- Test accuracy: 99.44%
- Consistent performance across all splits demonstrates the model learned general patterns, not memorized training data
-
Early Stopping Effectiveness: Model was saved at epoch 42 (val loss: 0.0008) but training continued until epoch 52. Validation loss plateaued, demonstrating early stopping prevented overfitting.
-
Validation Loss Behavior: Validation loss steadily decreased from 0.0269 (epoch 2) to 0.0008 (epoch 42), then plateaued. No signs of overfitting where validation loss would increase while training loss decreases.
Conclusion: The three-way split with a completely unseen test set, combined with early stopping and consistent performance across all splits, provides strong evidence that the model generalizes well to new data rather than overfitting to the training set.
Python 3.10+
CUDA-capable GPU (optional but recommended)- Clone the repository
git clone https://github.com/Pooja-Vachhad/defect-detection.git
cd defect-detection- Install dependencies
pip install -r requirements.txt- Download dataset Source: Kaggle - Real-Life Industrial Casting Dataset
python train.pyThis will:
- Split the test folder into validation (50%) and test (50%) sets
- Train the model with early stopping
- Save the best model as
best.pth
python test.pyThis will:
- Load the trained model (
best.pth) - Evaluate on the held-out test set (never seen during training)
- Generate confusion matrix and classification report
- Display prediction visualizations
-
Inline Quality Control
- Real-time inspection on production line
- Immediate rejection of defective parts
- Reduces waste and rework costs
-
Batch Inspection Systems
- Post-production quality audit
- Statistical process control
- Trend analysis for process improvement
-
Edge Deployment
- On-device inference (Jetson Nano, RPI)
- Low-latency response (<100ms)
- No cloud dependency
- Labor Cost Reduction: 80-90% decrease in manual inspection time
- Consistency: 24/7 operation without fatigue
- Accuracy: 99.7% vs. 95-97% human accuracy
- Scalability: One model, multiple production lines
- Traceability: Automated logging and analytics
- Integrate Grad-CAM for defect region visualization and model explainability
- Deploy as REST API using FastAPI for production integration
- Export to ONNX for optimized edge device deployment