| Journal: |
Scientific Reports
Nature
|
Volume: |
|
| Abstract: |
This research presents a deep novel Convolutional Neural Network (CNN) architecture specifically designed for multi-class image categorization in remote sensing data. The proposed model is evaluated using both the NWPU-RESISC45 and UC Merced Land Use datasets, each containing 10 class categories: harbor, chaparral, tennis court, industrial area, parking lot, forest, beach, overpass, airplane, and baseball diamond. Extensive testing demonstrates that the proposed CNN architecture outperforms five popular pre-trained CNN models in terms of accuracy and efficiency. Quantitative results show that the proposed model achieves an accuracy of 0.9428 on the NWPU-RESISC45 dataset and 0.93 on the UC Merced dataset. The recall scores are 0.94 and 0.93, while precision values reach 0.95 and 0.94, respectively. Furthermore, the Intersection over Union (IoU) scores are 0.89 and 0.86, while the F1-scores are 0.94 and 0.93, confirming the robustness of the model across the datasets. In terms of computational efficiency, the model demonstrates competitive training times: 3,692 s for NWPU-RESISC45 (5,057 images across 10 classes) with GPU memory usage of 12.7 gigabytes (GB) and 559 s for the UC Merced dataset (722 images across 10 classes). To ensure and enhance the interpretability and explainability of the model’s predictions, two interpretability techniques were incorporated: Shapley Additive Explanations (SHAP) and Class Activation Mapping (CAM).The key novelties of this manuscript include: a hybrid CNN framework that not only advances classification performance but also incorporates explainability via SHAP and CAM, while maintaining efficient training times and strong generalization, making it a compelling solution for remote sensing image understanding. In general, the key novel aspects of our approach are: Lightweight and efficient architecture: Our proposed CNN design strikes a balance between performance and computational efficiency, making it highly suitable for real-time or resource-constrained environments without compromising accuracy. Integrated interpretability: By incorporating both SHAP (Shapley Additive explanations) and CAM (Class Activation Mapping), our framework delivers strong predictive performance. Generalizability across datasets: We demonstrate the model’s robustness and generalizability across two challenging and diverse datasets, confirming its adaptability to different domain scenarios.
|
|
|