Accurate classification of computed tomography (CT) images is essential for diagnosis and treatment planning, but existing methods often struggle with the subtle and spatially diverse nature of pathological features. Current approaches typically process images uniformly, limiting their ability to detect localized abnormalities that require focused analysis. We introduce UGPL, an uncertainty-guided progressive learning framework that performs a global-to-local analysis by first identifying regions of diagnostic ambiguity and then conducting detailed examination of these critical areas. Our approach employs evidential deep learning to quantify predictive uncertainty, guiding the extraction of informative patches through a non-maximum suppression mechanism that maintains spatial diversity. This progressive refinement strategy, combined with an adaptive fusion mechanism, enables UGPL to integrate both contextual information and fine-grained details. Experiments across three CT datasets demonstrate that UGPL consistently outperforms state-of-the-art methods, achieving improvements of 3.29%, 2.46%, and 8.08% in accuracy for kidney abnormality, lung cancer, and COVID-19 detection, respectively. Our analysis shows that the uncertainty-guided component provides substantial benefits, with performance dramatically increasing when the full progressive learning pipeline is implemented.
We introduce Uncertainty-Guided Progressive Learning (UGPL), a novel framework that mimics diagnostic behavior by performing global analysis followed by focused examination of uncertain regions. UGPL addresses limitations of uniform processing by dynamically allocating computational resources where needed. Our framework first employs a global uncertainty estimator to perform initial classification and generate pixel-wise uncertainty maps, then selects high-uncertainty regions for detailed analysis through a local refinement network. These multi-resolution analyses are combined via an adaptive fusion module that weights predictions based on confidence. Unlike existing methods that treat uncertainty merely as an output signal, UGPL explicitly uses it to guide computational focus, maintaining efficiency while improving performance on diagnostically challenging regions. UGPL processes the input CT image to produce both classification probabilities and an uncertainty map that guides the extraction of high-uncertainty patches using non-maximum suppression. Each patch undergoes high-resolution analysis through a local refinement network, producing patch-specific classification scores and confidence estimates. The adaptive fusion module then integrates global and local predictions using learned weights based on their estimated reliability. Multiple specialized loss functions are jointly optimized, guiding components to work in tandem, adapt according to diagnostic difficulty, and improve performance over uniform processing.
The LM performs poorly across all tasks, particularly for kidney abnormalities (40.57%) and lung cancer classification (51.22%), as local patches alone lack sufficient context and focus on irrelevant regions without global guidance. The following figures shows performance trends across tasks, with COVID-19 detection showing the most significant gains from LM to FM.
The following table and ROC curves show that for COVID-19, GM and FM achieve similar AUC scores (0.901 vs. 0.900). For lung cancer, FM achieves slight improvements across classes, especially for benign cases (0.991 vs. 0.992). For kidney cases, FM improves performance for most classes, including kidney stones (0.984 vs. 0.986).
Models | Kidney Abnormalities | Lung Cancer Type | COVID Presence | |||
---|---|---|---|---|---|---|
Accuracy | F1 | Accuracy | F1 | Accuracy | F1 | |
ShuffleNetV2 | 0.96 ± 0.0085 | 0.95 ± 0.0092 | 0.94 ± 0.0127 | 0.91 ± 0.0143 | 0.69 ± 0.0234 | 0.67 ± 0.0251 |
VGG16 | 0.89 ± 0.0156 | 0.88 ± 0.0173 | 0.95 ± 0.0098 | 0.91 ± 0.0165 | 0.48 ± 0.0287 | 0.47 ± 0.0306 |
ConvNeXt | 0.81 ± 0.0189 | 0.80 ± 0.0195 | 0.95 ± 0.0076 | 0.95 ± 0.0084 | 0.61 ± 0.0267 | 0.59 ± 0.0278 |
DenseNet121 | 0.94 ± 0.0102 | 0.93 ± 0.0118 | 0.90 ± 0.0171 | 0.89 ± 0.0176 | 0.78 ± 0.0198 | 0.76 ± 0.0213 |
DenseNet201 | 0.95 ± 0.0093 | 0.94 ± 0.0106 | 0.84 ± 0.0203 | 0.83 ± 0.0218 | 0.76 ± 0.0206 | 0.74 ± 0.0229 |
EfficientNetB0 | 0.95 ± 0.0078 | 0.94 ± 0.0089 | 0.95 ± 0.0081 | 0.95 ± 0.0073 | 0.73 ± 0.0221 | 0.71 ± 0.0238 |
MobileNetV2 | 0.87 ± 0.0179 | 0.85 ± 0.0195 | 0.70 ± 0.0267 | 0.69 ± 0.0283 | 0.70 ± 0.0241 | 0.68 ± 0.0256 |
ViT | 0.94 ± 0.0154 | 0.92 ± 0.0167 | 0.51 ± 0.0389 | 0.22 ± 0.0456 | 0.56 ± 0.0312 | 0.55 ± 0.0318 |
Swin | 0.68 ± 0.0298 | 0.40 ± 0.0421 | 0.60 ± 0.0334 | 0.41 ± 0.0398 | 0.53 ± 0.0331 | 0.53 ± 0.0329 |
DeiT | 0.92 ± 0.0162 | 0.90 ± 0.0178 | 0.66 ± 0.0312 | 0.46 ± 0.0387 | 0.44 ± 0.0356 | 0.35 ± 0.0412 |
CoaT | 0.98 ± 0.0067 | 0.98 ± 0.0072 | 0.95 ± 0.0089 | 0.93 ± 0.0112 | 0.68 ± 0.0254 | 0.66 ± 0.0267 |
CrossViT | 0.97 ± 0.0087 | 0.97 ± 0.0094 | 0.58 ± 0.0356 | 0.39 ± 0.0423 | 0.62 ± 0.0289 | 0.48 ± 0.0378 |
CRNet | - | - | - | - | 0.73 ± 0.0218 | 0.76 ± 0.0203 |
UGPL (Ours) | 0.99 ± 0.0023 | 0.99 ± 0.0031 | 0.98 ± 0.0047 | 0.97 ± 0.0052 | 0.81 ± 0.0134 | 0.79 ± 0.0147 |
To analyze the contribution of each component in our progressive learning framework, we compare four configurations: (1) a global-only setup that uses the global uncertainty estimator without local refinement; (2) a no uncertainty guidance (No UG) variant, where patches are selected randomly instead of using uncertainty maps; (3) a fixed patches configuration that uses predefined patch locations rather than adaptive selection; and (4) the full model, which includes all components of the UGPL framework. The following table shows our full model consistently outperforming all reduced variants by substantial F1 margins. On the COVID dataset, all ablations cause dramatic performance drops, with the global-only variant achieving only 14.95% F1. For lung cancer detection, the full model obtains 97.64% F1, while the global-only setup drops to 34.19%. The kidney dataset shows smaller yet significant gaps, with the full model reaching 99.6% F1 versus 58.7% for the best ablated configuration (fixed patches). Interestingly, No UG and fixed patches sometimes perform worse than the global-only model, showing that naively adding local components without proper guidance can be detrimental and highlighting the importance of uncertainty-guided patch selection.
Configuration | Loss Weights | COVID Presence | Lung Cancer Type | Kidney Abnormalities | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
λf | λg | λl | λu | λc | λco | λd | Accuracy | F1 | Accuracy | F1 | Accuracy | F1 | |
C1: Baseline | 1.0 | 0.5 | 0.5 | 0.3 | 0.2 | 0.1 | 0.1 | 0.8108 | 0.7903 | 0.9817 | 0.9764 | 0.9971 | 0.9945 |
C2: Local Emphasis | 1.0 | 0.3 | 0.7 | 0.3 | 0.2 | 0.1 | 0.1 | 0.7946 | 0.7758 | 0.9695 | 0.9641 | 0.9928 | 0.9903 |
C3: Global-Centric | 1.0 | 0.7 | 0.3 | 0.3 | 0.2 | 0.1 | 0.1 | 0.7568 | 0.7402 | 0.9634 | 0.9576 | 0.9876 | 0.9832 |
C4: Uncertainty Focus | 1.0 | 0.5 | 0.5 | 0.6 | 0.2 | 0.1 | 0.1 | 0.8243 | 0.8057 | 0.9756 | 0.9687 | 0.9953 | 0.9931 |
C5: Consistency-Driven | 1.0 | 0.5 | 0.5 | 0.3 | 0.5 | 0.1 | 0.1 | 0.7892 | 0.7689 | 0.9786 | 0.9723 | 0.9913 | 0.9889 |
C6: Balanced High | 1.0 | 0.5 | 0.5 | 0.4 | 0.4 | 0.2 | 0.2 | 0.8051 | 0.7836 | 0.9801 | 0.9739 | 0.9942 | 0.9918 |
C7: Diversity-Enhanced | 1.0 | 0.5 | 0.5 | 0.3 | 0.2 | 0.1 | 0.4 | 0.7784 | 0.7569 | 0.9667 | 0.9602 | 0.9895 | 0.9856 |
C8: Confidence-Calibrated | 1.0 | 0.5 | 0.5 | 0.3 | 0.2 | 0.4 | 0.1 | 0.7973 | 0.7798 | 0.9753 | 0.9695 | 0.9923 | 0.9891 |
C9: Conservative | 0.5 | 0.25 | 0.25 | 0.15 | 0.1 | 0.05 | 0.05 | 0.7486 | 0.7312 | 0.9581 | 0.9524 | 0.9837 | 0.9803 |
C10: Aggressive | 2.0 | 1.0 | 1.0 | 0.6 | 0.4 | 0.2 | 0.2 | 0.8023 | 0.7827 | 0.9728 | 0.9674 | 0.9932 | 0.9907 |
The following table compares ten loss weight configurations across datasets. The baseline configuration (C1) with balanced weights performs best overall (fused: 1.0, global/local: 0.5 each, uncertainty: 0.3, consistency: 0.2, confidence/diversity: 0.1 each). Configurations emphasizing either global or local branches underperform, confirming the necessity of combining global context with local detail. Increased uncertainty weighting (C4) improves COVID detection (82.43% accuracy, 80.57% F1) but slightly reduces performance on Lung and Kidney datasets where target features are more prominent. C5 (Consistency-Driven) excels on the Lung dataset (97.86% accuracy) where structural patterns are clearer, while uniform scaling of all components (C9 & C10) shows no improvement, indicating that relative balance matters more than absolute weight values.
Patch Size | Patches | Kidney | Lung | COVID |
---|---|---|---|---|
32 | 2 | 0.9586 | 0.8869 | 0.7161 |
3 | 0.9673 | 0.9195 | 0.7368 | |
4 | 0.9541 | 0.8756 | 0.7454 | |
64 | 2 | 0.9824 | 0.9764 | 0.7521 |
3 | 0.9945 | 0.8671 | 0.7368 | |
4 | 0.9765 | 0.9343 | 0.7903 | |
96 | 2 | 0.9622 | 0.8712 | 0.7372 |
3 | 0.9701 | 0.9099 | 0.7262 | |
4 | 0.9418 | 0.8717 | 0.6505 |
@InProceedings{UGPL2025,
author = {Venkatraman, Shravan and Kumar S, Pavan and Raj, Rakesh and S, Chandrakala},
title = {UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2025}
}