Comparative study of three deep learning segmentation architectures for cloud detection in Landsat-8 imagery.
Implemented and evaluated for an independent study at the University of Southern Maine, conducted under the supervision of Dr. Behrooz Mansouri at the AIIR Lab. All three models are trained and evaluated on the 38-Cloud dataset.
| Model | Architecture | Backbone | Pretrained |
|---|---|---|---|
| Cloud-Net | U-Net variant with multi-scale skip connections | Custom CNN | No |
| DeepLabV3+ | ASPP encoder-decoder | ResNet-50 | ImageNet |
| SegFormer | Hierarchical Vision Transformer | MIT-B2 | No |
| Model | Val IoU | Test IoU | Test F1 | Train Time |
|---|---|---|---|---|
| Cloud-Net | 0.939 | 0.608 | 0.714 | 3.5 hrs |
| DeepLabV3+ | 0.941 | 0.606 | 0.716 | 16.8 hrs |
| SegFormer | 0.943 | — | — | 26.2 hrs |
Note: SegFormer achieved the highest validation IoU but its test evaluation is affected by a normalisation pipeline mismatch. The checkpoint is intact and the model can be re-evaluated once the fix described below is applied. See the paper for full diagnosis.
Download the 38-Cloud dataset from Kaggle: https://www.kaggle.com/datasets/sorour/38cloud-cloud-segmentation-in-satellite-images
Expected directory structure after downloading:
data/
├── Training/
│ ├── train_red/
│ ├── train_green/
│ ├── train_blue/
│ ├── train_nir/
│ ├── train_gt/
│ └── training_patches_38-Cloud.csv
└── Test/
├── test_red/
├── test_green/
├── test_blue/
├── test_nir/
├── Entire_scene_gts/
└── test_patches_38-Cloud.csv
pip install torch torchvision tifffile pillow numpy pandas scikit-learn tqdm segmentation-models-pytorch
segmentation-models-pytorchis required for DeepLabV3+ only.
.
├── train_cloudnet.py # Cloud-Net training + test evaluation
├── train_deeplabv3plus.py # DeepLabV3+ training + test evaluation
├── train_transformer_fixed.py # SegFormer training + test evaluation
├── visualize_cloudnet.py # Visualisation for Cloud-Net outputs
├── visualize_deeplabv3plus.py # Visualisation for DeepLabV3+ outputs
├── visualize_transformer.py # Visualisation for SegFormer outputs
├── cloud_net_model.py # Original Keras Cloud-Net architecture (reference)
├── generators.py # Original Keras data generators (reference)
├── losses.py # Original Keras Jaccard loss (reference)
├── augmentation.py # Original Keras augmentation functions (reference)
├── main_train.py # Original Keras training entry point (reference)
└── main_test.py # Original Keras test entry point (reference)
Files prefixed with
cloud_net_model,generators,losses,augmentation,main_train, andmain_testare the original Keras/TensorFlow reference implementation. The threetrain_*.pyscripts are the full PyTorch reimplementations used for this study.
Each script is self-contained — just point it at your data directory in the Config class and run.
Cloud-Net:
python train_cloudnet.pyDeepLabV3+:
python train_deeplabv3plus.pySegFormer:
python train_transformer_fixed.pyAll three scripts will:
- Compute per-channel mean and standard deviation from the training set
- Train the model with early stopping
- Save the best checkpoint (including mean/std) to the model's output directory
- Automatically run test-set evaluation at the end
Checkpoints are saved to:
outputs_cloudnet/checkpoints/best_model.pthoutputs_deeplabv3plus/checkpoints/best_model.pthoutputs_segformer/checkpoints/best_model.pth
All three scripts include a smoke test mode that runs the full pipeline on synthetic data in ~30 seconds without requiring the dataset. Run these first to verify your environment is set up correctly:
python train_cloudnet.py --smoketest
python train_deeplabv3plus.py --smoketest
python train_transformer_fixed.py --smoketestAfter training and evaluation, generate patch visualisations and coverage maps:
python visualize_cloudnet.py
python visualize_deeplabv3plus.py
python visualize_transformer.pyOutputs are saved to outputs_<model>/visualisations/ and include:
best_patches.png— top 12 patches by IoUworst_patches.png— bottom 12 patches by IoUrandom_patches.png— 12 randomly sampled patchescloud_coverage_map.png— predicted cloud fraction per patch across the test settraining_history.png— loss, IoU, and F1 curves
SegFormer's test-set results (IoU 0.025) do not reflect the model's true capability. The model trained correctly to a validation IoU of 0.943 — the highest of all three models — but the test evaluation was affected by a normalisation mismatch between training and inference.
To fix and re-evaluate without retraining:
First, confirm the issue by checking whether the checkpoint contains the normalisation statistics:
import torch
ckpt = torch.load("outputs_segformer/checkpoints/best_model.pth", map_location="cpu")
print(list(ckpt.keys())) # should include 'mean' and 'std'If mean and std are present, create a file called run_test_eval.py:
import sys
sys.path.insert(0, '.')
from train_transformer_fixed import Config, SegFormer, evaluate_test
import torch
cfg = Config()
ckpt = torch.load("outputs_segformer/checkpoints/best_model.pth", map_location=cfg.DEVICE)
mean = ckpt["mean"]
std = ckpt["std"]
model = SegFormer(cfg).to(cfg.DEVICE)
model.load_state_dict(ckpt["state_dict"])
test_metrics, test_df = evaluate_test(model, cfg, mean, std)
print(test_df)Then run:
python run_test_eval.pyIf mean and std are not in the checkpoint, the model needs to be retrained. Running python train_transformer_fixed.py from scratch will produce a correct checkpoint and automatically run test evaluation at the end.
Key hyperparameters are set in the Config class at the top of each training script. Notable defaults:
| Setting | Cloud-Net | DeepLabV3+ | SegFormer |
|---|---|---|---|
| Image size | 192 × 192 | 128 × 128 | 192 × 192 |
| Batch size | 6 | 4 | 8 |
| Learning rate | 1e-4 | Enc: 1e-4 / Dec: 1e-3 | 1e-4 |
| Max epochs | 50 | 50 | 50 |
| Early stop patience | 10 | 8 | 8 |
| Val split | 20% | 10% | 10% |
- Mohajerani, S., & Saeedi, P. (2019). Cloud-Net: An end-to-end cloud detection algorithm for Landsat 8 imagery. IGARSS 2019.
- Chen, L.-C., et al. (2018). Encoder-decoder with atrous separable convolution for semantic image segmentation. ECCV 2018.
- Xie, E., et al. (2021). SegFormer: Simple and efficient design for semantic segmentation with transformers. NeurIPS 2021.
- Zhu, Z., & Woodcock, C. E. (2012). Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment, 118, 83–94.
Kristina Zbinden — University of Southern Maine, Independent Study, 2025
Research Conducted Under Supervision of Lab Director: Dr. Behrooz Mansouri — University of Southern Maine, AIIR Lab