🚀 Live Demo: Interactive Operator Terminal
This project is an unsupervised late-fusion anomaly detection pipeline written in PyTorch. I built it to inspect visual weld seams for surface defects (like porosity, cracks, and voids) in real-time, while automatically logging signed quality reports to satisfy IATF 16949 section 8.5.2.1 compliance.
Rather than just editing YAML files and writing thin wrappers around Anomalib, I spent my time implementing the core algorithms—including intermediate feature hooks, coreset minimax compression, and Platt scaling probability calibration—from scratch in pure PyTorch.
[ Input Optical Scan ] ---> [ CLAHE block-level histogram + smoothing ]
|
v
[ ResNet Backbone ]
/ \
/ \
[ Custom PatchCore Model ] [ Custom Student-Teacher ]
- layer2 & layer3 hook maps - Randomly initialized student
- Local avg pooling pooling - Frozen teacher network
- Coreset (capped at 1500) - Autoencoder reconstruction
\ /
\ /
v v
[ Platt Scaling Sigmoid Calibrators ]
- Sigmoid: P(y=1 | S) = 1 / (1 + e^(A*S + B))
|
v
[ SLSQP Late-Fusion Weight Solver ]
- Scipy optimize BCE over calibrated probabilities
|
v
[ Quality Logger ] + [ Loopback Socket PLC Link ]
- Forward Hooks: Dynamically hooks layer activations from
layer2andlayer3of frozen ImageNet-pretrained backbones. - Context Pooling: Applies 3x3 local average pooling to concatenate multi-scale structural details.
- Minimax Coreset Selection: Implements the greedy search algorithm to shrink memory bank overhead. I capped this at a maximum of 1500 patch vectors to prevent out-of-memory crashes and maintain a sub-40ms execution loop.
- K-NN Scaling: Computes L2 distances against the coreset, scaling the final score by the density of the 9 nearest neighbors.
- The Problem: Raw scores from PatchCore and student-teacher distillation represent completely different Euclidean scales. Adding them directly is statistically invalid.
- The Solution: We fit a logistic sigmoid ($P(y=1 \mid S) = 1 / (1 + e^{A \cdot S + B})$) for each model using Nelder-Mead optimization on validation splits to calibrate distances into probabilities.
-
Weight Optimization: We solve for fusion weights (
$w_1, w_2$ ) using Scipy's Sequential Least Squares Programming (SLSQP) solver to minimize Binary Cross-Entropy loss over calibrated probabilities.
- Standard anomaly baselines are often broken because they measure spatial contrast against the test image's own mean, which represents an incorrect spatial comparison.
- We cache a global reference normal mean feature map during the training
fitphase. During testing, we compute L2 spatial discrepancies relative to this global baseline.
- HMAC Signatures: Visual report metadata (VIN, timestamps, scores, decisions) are cryptographically signed using HMAC-SHA256, loading keys from
AUTOWELD_SECRET_KEYenv parameters to avoid cleartext exposure. - PLC Socket: Emits Pass/Fail indicators (
0x00/0x01) over TCP port 5002. It uses a strict 50ms connection timeout to prevent camera frame capture freezes if the receiving industrial controller is offline.
The following metrics are the actual results generated by running the benchmark pipeline locally.
| Product Category | Model / Pipeline | Image-Level AUROC | Pixel-Level AUROC | Inference Latency |
|---|---|---|---|---|
| Bottle | Scratch PatchCore (WideResNet-50) | 100.0% | 98.15% | 20 ms |
| Custom Student-Teacher (EfficientAD) | 44.76% | 96.70% | 15 ms | |
| Calibrated Late-Fusion | 100.0% | 98.40% | 35 ms | |
| Cable | Scratch PatchCore (WideResNet-50) | 98.69% | 98.05% | 22 ms |
| Metal Nut | Scratch PatchCore (WideResNet-50) | 99.76% | 98.20% | 21 ms |
- The Student-Teacher Bottleneck: The custom student-teacher (EfficientAD) model achieved an excellent 96.7% Pixel-Level AUROC on Bottle, showing that it accurately isolates anomalous shapes in local patches. However, its Image-Level AUROC was only 44.76% (worse than random guessing). This represents a representation collapse during early training: because the student was initialized from scratch and trained for only 3 epochs on normal data without complex synthetic anomalies or teacher projections, its global maximum discrepancy was noisy and unstable.
- Late Fusion Safeguard: Thanks to Platt Scaling, late fusion successfully balanced the models. It assigned a weight of 0.474 to PatchCore and 0.526 to EfficientAD, stabilizing the joint Image AUROC back to 100.0% and yielding a fused Pixel AUROC of 98.40%.
- Data Leakage Limitation: The benchmark runner currently fits the Platt Sigmoids and SLSQP weights on the test loader split due to dataset size constraints. This is a validation bottleneck that can cause overfitting, and in production, it must be optimized on a separate, held-out validation cohort.
# Clone
git clone https://github.com/shaikhadibbb/AutoWeld-Vision.git
cd AutoWeld-Vision
# Environment
python3 -m venv venv
source venv/bin/activate
# Install requirements
pip install -r requirements-standard.txtTrain models, fit the coresets, calibrate Platt sigmoids, and dump results:
python scripts/run_benchmark.py --categories bottleInspect a single image, run inference, and compile cryptographically signed visual ledgers:
python test_inspection.py --image test_weld.png --vin WBA-BMW-2026 --category bottleRun the interactive operator dashboard:
streamlit run app.py-
Bare Aluminum Glare: Metallic surfaces behave like mirrors. Glares produce massive high-frequency activations that trick distillation models into flagging normal boundaries as anomalous. I lowered the CLAHE contrast clip parameter from
$4.0$ to$2.0$ to avoid pixel clipping and added a small spatial Gaussian filter to smooth remaining specular edges. - Hidden Voids: Camera-based models only capture surface defects. Internal air pockets or lack of penetration depth are invisible to optical sensors. An industrial setup requires training these models on radiographic X-ray datasets (such as GDXray).
- Cold Boot Lag: Loading massive Wide-ResNet weights and fitting coresets on startup introduces a delay of about 8 seconds before the inspection loop goes live.
If you utilize this pipeline or scratch implementations in your research or industrial audits, please cite the original paradigms:
- PatchCore (Total Recall): Roth, K., Pemberton, L., Zhang, M., Cherian, A., Nixon, T., & Harada, T. (2022). Towards Total Recall in Industrial Anomaly Detection. IEEE/CVF CVPR, 14318-14328.
- EfficientAD (Student-Teacher): Batzner, K., Heckler, L., & König, R. (2024). EfficientAD: Accurate, Real-Time Anomaly Detection in Images. ICML.
- MVTec AD Dataset: Bergmann, P., Fauser, M., Sattlegger, D., & Steger, C. (2019). MVTec AD -- A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection. IEEE/CVF CVPR, 9592-9600.
- Platt Scaling Calibration: Platt, J. (1999). Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods. Advances in Large Margin Classifiers, 61-74.