CasCAM: Cascaded Class Activation Mapping

Supplementary Materials for Experimental Results

Performance Metrics

We evaluate CasCAM on the Oxford-IIIT Pet dataset, which contains 7,349 images across 37 pet categories with pixel-level segmentation annotations. This dataset provides an ideal benchmark for evaluating localization accuracy due to its clean single-object images with precise ground truth masks.

Oxford-IIIT Pet Dataset Results

Best Configuration: num_iter=10, θ=0.3, top_k=10, λ=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.6529 0.2366 89.24% 0.6499
CAM 0.5190 0.1432 42.29% 0.5001
GradCAM 0.5191 0.1325 42.29% 0.5001
HiResCAM 0.5191 0.1325 42.29% 0.5001
GradCAM++ 0.5207 0.1287 42.02% 0.5054
ScoreCAM 0.5243 0.1524 42.22% 0.5169
AblationCAM 0.5213 0.1314 41.81% 0.5050
XGradCAM 0.5191 0.1325 42.29% 0.5001
FullGrad 0.4940 0.2350 59.61% 0.5142
EigenGradCAM 0.5007 0.0762 41.81% 0.4913
LayerCAM 0.5225 0.1316 42.22% 0.5100
Key Finding: CasCAM achieves 89.24% Pointing Game accuracy, a +29.63%p improvement over FullGrad (59.61%). For IoU, CasCAM achieves 0.2366, comparable to FullGrad (0.2350) while significantly outperforming all other methods.

Generalization to MS-COCO Dataset

To validate generalization, we also evaluate CasCAM on the more challenging MS-COCO dataset with 1,728 images containing multiple objects and complex backgrounds.

Configuration: num_iter=15, θ=0.1, top_k=10, λ=0.0
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5417 0.2050 75.98% 0.3817
CAM 0.2788 0.1143 20.43% 0.2225
GradCAM 0.2792 0.1078 20.43% 0.2225
HiResCAM 0.2792 0.1078 20.43% 0.2225
GradCAM++ 0.2844 0.1038 21.41% 0.2250
ScoreCAM 0.2842 0.1164 20.43% 0.2261
AblationCAM 0.2827 0.1049 20.83% 0.2247
XGradCAM 0.2792 0.1078 20.43% 0.2225
FullGrad 0.4223 0.2368 44.44% 0.3757
EigenGradCAM 0.2658 0.0756 20.78% 0.2230
LayerCAM 0.2852 0.1085 21.01% 0.2253
Key Finding: CasCAM achieves 75.98% Pointing Game on MS-COCO, a +31.54%p improvement over FullGrad (44.44%). This demonstrates that CasCAM's iterative refinement effectively handles complex multi-object scenes.

Summary Comparison

Dataset Method AP Pointing Game IoU
Oxford-IIIT Pet CasCAM 0.6529 89.24% 0.2366
Oxford-IIIT Pet FullGrad 0.4940 59.61% 0.2350
MS-COCO CasCAM 0.5417 75.98% 0.2050
MS-COCO FullGrad 0.4223 44.44% 0.2368

Oxford-IIIT Pet: CasCAM Hyperparameter Analysis

We conducted 23 experiments on the Oxford-IIIT Pet dataset with different hyperparameter configurations to identify optimal settings. The Top-K threshold method consistently outperforms other threshold methods across all metrics.

CasCAM with Top-K Threshold

num_iter \(\theta\) top_k AP IoU Pointing Game Top-15%
3 0.1 10 0.5032 0.0581 42.83% 0.4823
3 0.1 20 0.5165 0.1735 44.38% 0.5524
3 0.3 10 0.5447 0.1661 65.02% 0.5756
3 0.3 20 0.5602 0.2312 55.68% 0.5806
5 0.1 10 0.5517 0.1669 63.40% 0.5913
5 0.1 20 0.5472 0.2236 48.17% 0.5803
5 0.3 10 0.5820 0.2090 78.82% 0.5926
5 0.3 20 0.6078 0.2755 71.85% 0.6193
10 0.1 10 0.5926 0.2067 81.33% 0.6095
10 0.1 20 0.5826 0.2608 61.43% 0.6094
10 0.3 10 0.6529 0.2366 89.24% 0.6499
10 0.3 20 0.6393 0.3049 78.48% 0.6545

The configuration with \(\text{top\_k}=10\) achieves the best Pointing Game accuracy (89.24%), while \(\text{top\_k}=20\) achieves the best IoU (0.3049). The choice between these configurations depends on the specific optimization target.

CasCAM with EBayesThresh

Empirical Bayes thresholding provides adaptive, data-driven threshold selection.

num_iter \(\theta\) AP IoU Pointing Game Top-15%
3 0.1 0.4449 0.0497 41.27% 0.3471
3 0.3 0.5163 0.0685 45.53% 0.5459
5 0.3 0.5646 0.1089 59.61% 0.6109
10 0.1 0.5505 0.0922 54.87% 0.5964
10 0.3 0.5951 0.1260 69.01% 0.6262

EBayesThresh underperforms compared to Top-K threshold, but still achieves 69.01% Pointing Game accuracy, which outperforms most baseline methods.

CasCAM without Threshold

For comparison, we also evaluate CasCAM without explicit thresholding, using raw CAM values.

num_iter \(\theta\) AP IoU Pointing Game Top-15%
3 0.1 0.5312 0.1715 43.03% 0.5419
3 0.3 0.5050 0.1409 42.42% 0.5131
5 0.1 0.5266 0.1616 42.29% 0.5415
5 0.3 0.5232 0.1976 42.22% 0.5274
10 0.1 0.5417 0.2136 46.14% 0.5756
10 0.3 0.4973 0.2179 40.66% 0.4966

Without thresholding, CasCAM fails to outperform baseline methods, demonstrating that the threshold method is essential for CasCAM's effectiveness.

Best Configurations Summary

Optimization Target Best Configuration Value Improvement vs. Best Baseline
Pointing Game iter=10, \(\theta\)=0.3, top_k=10 89.24% +29.63%p vs FullGrad (59.61%)
IoU iter=10, \(\theta\)=0.3, top_k=20 0.3049 +29.7% vs FullGrad (0.2350)
AP iter=10, \(\theta\)=0.3, top_k=10 0.6529 +24.5% vs ScoreCAM (0.5243)
Top-15% Energy iter=10, \(\theta\)=0.3, top_k=20 0.6545 +26.6% vs ScoreCAM (0.5169)

All Oxford-IIIT Pet Experiment Results (Reference)

The following tables present complete results for all 23 experimental configurations on the Oxford-IIIT Pet dataset. Each table shows CasCAM performance compared against all baseline methods under a specific hyperparameter setting. These results serve as a comprehensive reference for reproducibility and detailed analysis.

CasCAM with Top-K Threshold (12 configurations)

Configuration: num_iter=3, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5032 0.0581 42.83% 0.4823
CAM 0.5168 0.1370 41.88% 0.4961
GradCAM 0.5170 0.1277 41.88% 0.4961
HiResCAM 0.5170 0.1277 41.88% 0.4961
GradCAM++ 0.5186 0.1267 42.29% 0.5010
ScoreCAM 0.5223 0.1487 41.95% 0.5125
AblationCAM 0.5189 0.1293 41.54% 0.5007
XGradCAM 0.5170 0.1277 41.88% 0.4961
FullGrad 0.4932 0.2350 58.39% 0.5147
EigenGradCAM 0.4975 0.0757 41.88% 0.4879
LayerCAM 0.5203 0.1310 41.61% 0.5056
Configuration: num_iter=3, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5165 0.1735 44.38% 0.5524
CAM 0.5196 0.1486 42.15% 0.5020
GradCAM 0.5197 0.1382 42.15% 0.5020
HiResCAM 0.5197 0.1382 42.15% 0.5020
GradCAM++ 0.5221 0.1369 42.49% 0.5082
ScoreCAM 0.5272 0.1667 42.63% 0.5197
AblationCAM 0.5227 0.1404 41.75% 0.5075
XGradCAM 0.5197 0.1382 42.15% 0.5020
FullGrad 0.5104 0.2476 62.18% 0.5324
EigenGradCAM 0.4992 0.0849 42.15% 0.4926
LayerCAM 0.5240 0.1425 42.69% 0.5124
Configuration: num_iter=3, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5447 0.1661 65.02% 0.5756
CAM 0.5151 0.1366 42.02% 0.4937
GradCAM 0.5153 0.1299 42.02% 0.4937
HiResCAM 0.5153 0.1299 42.02% 0.4937
GradCAM++ 0.5177 0.1287 41.61% 0.5008
ScoreCAM 0.5220 0.1477 42.08% 0.5112
AblationCAM 0.5181 0.1322 42.02% 0.4999
XGradCAM 0.5153 0.1299 42.02% 0.4937
FullGrad 0.5265 0.2548 58.39% 0.5576
EigenGradCAM 0.4951 0.0856 41.47% 0.4884
LayerCAM 0.5196 0.1352 41.75% 0.5057
Configuration: num_iter=3, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5602 0.2312 55.68% 0.5806
CAM 0.5252 0.1563 42.29% 0.5149
GradCAM 0.5254 0.1428 42.29% 0.5149
HiResCAM 0.5254 0.1428 42.29% 0.5149
GradCAM++ 0.5276 0.1436 42.02% 0.5206
ScoreCAM 0.5311 0.1672 42.35% 0.5297
AblationCAM 0.5280 0.1441 41.88% 0.5203
XGradCAM 0.5254 0.1428 42.29% 0.5149
FullGrad 0.5150 0.2464 60.69% 0.5402
EigenGradCAM 0.5063 0.0830 42.35% 0.5052
LayerCAM 0.5295 0.1484 42.29% 0.5248
Configuration: num_iter=5, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5517 0.1669 63.40% 0.5913
CAM 0.5200 0.1448 42.22% 0.5038
GradCAM 0.5202 0.1331 42.22% 0.5038
HiResCAM 0.5202 0.1331 42.22% 0.5038
GradCAM++ 0.5222 0.1317 42.35% 0.5090
ScoreCAM 0.5263 0.1631 42.22% 0.5202
AblationCAM 0.5227 0.1344 41.75% 0.5090
XGradCAM 0.5202 0.1331 42.22% 0.5038
FullGrad 0.5022 0.2483 64.55% 0.5208
EigenGradCAM 0.5019 0.0785 42.08% 0.4943
LayerCAM 0.5242 0.1364 42.29% 0.5137
Configuration: num_iter=5, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5472 0.2236 48.17% 0.5803
CAM 0.5222 0.1330 42.08% 0.5117
GradCAM 0.5222 0.1206 42.08% 0.5117
HiResCAM 0.5222 0.1206 42.08% 0.5117
GradCAM++ 0.5242 0.1177 42.29% 0.5176
ScoreCAM 0.5298 0.1543 42.15% 0.5311
AblationCAM 0.5250 0.1228 41.81% 0.5178
XGradCAM 0.5222 0.1206 42.08% 0.5117
FullGrad 0.4892 0.2342 63.60% 0.5027
EigenGradCAM 0.4861 0.0696 39.72% 0.4792
LayerCAM 0.5260 0.1209 42.29% 0.5225
Configuration: num_iter=5, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5820 0.2090 78.82% 0.5926
CAM 0.5207 0.1425 42.22% 0.5049
GradCAM 0.5208 0.1317 42.22% 0.5049
HiResCAM 0.5208 0.1317 42.22% 0.5049
GradCAM++ 0.5226 0.1308 42.35% 0.5102
ScoreCAM 0.5278 0.1663 42.29% 0.5243
AblationCAM 0.5230 0.1338 41.75% 0.5099
XGradCAM 0.5208 0.1317 42.22% 0.5049
FullGrad 0.5004 0.2425 63.33% 0.5198
EigenGradCAM 0.5030 0.0787 42.22% 0.4965
LayerCAM 0.5246 0.1353 42.29% 0.5152
Configuration: num_iter=5, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.6078 0.2755 71.85% 0.6193
CAM 0.5228 0.1519 42.42% 0.5060
GradCAM 0.5230 0.1397 42.42% 0.5060
HiResCAM 0.5230 0.1397 42.42% 0.5060
GradCAM++ 0.5256 0.1385 42.29% 0.5125
ScoreCAM 0.5296 0.1689 42.15% 0.5239
AblationCAM 0.5258 0.1411 41.81% 0.5116
XGradCAM 0.5230 0.1397 42.42% 0.5060
FullGrad 0.4992 0.2441 60.28% 0.5172
EigenGradCAM 0.5060 0.0799 41.88% 0.4985
LayerCAM 0.5274 0.1434 42.29% 0.5170
Configuration: num_iter=10, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5926 0.2067 81.33% 0.6095
CAM 0.5189 0.1497 42.02% 0.4983
GradCAM 0.5190 0.1400 42.02% 0.4983
HiResCAM 0.5190 0.1400 42.02% 0.4983
GradCAM++ 0.5210 0.1353 42.49% 0.5041
ScoreCAM 0.5253 0.1553 42.02% 0.5154
AblationCAM 0.5214 0.1403 41.81% 0.5032
XGradCAM 0.5190 0.1400 42.02% 0.4983
FullGrad 0.4973 0.2371 58.59% 0.5193
EigenGradCAM 0.4966 0.0830 41.47% 0.4854
LayerCAM 0.5225 0.1397 42.22% 0.5082
Configuration: num_iter=10, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5826 0.2608 61.43% 0.6094
CAM 0.5229 0.1607 42.02% 0.5060
GradCAM 0.5230 0.1479 42.02% 0.5060
HiResCAM 0.5230 0.1479 42.02% 0.5060
GradCAM++ 0.5252 0.1462 42.08% 0.5119
ScoreCAM 0.5289 0.1784 42.42% 0.5230
AblationCAM 0.5254 0.1491 41.81% 0.5110
XGradCAM 0.5230 0.1479 42.02% 0.5060
FullGrad 0.4919 0.2431 61.57% 0.5093
EigenGradCAM 0.5078 0.0818 42.35% 0.4998
LayerCAM 0.5270 0.1524 42.15% 0.5167
Configuration: num_iter=10, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.6529 0.2366 89.24% 0.6499
CAM 0.5190 0.1432 42.29% 0.5001
GradCAM 0.5191 0.1325 42.29% 0.5001
HiResCAM 0.5191 0.1325 42.29% 0.5001
GradCAM++ 0.5207 0.1287 42.02% 0.5054
ScoreCAM 0.5243 0.1524 42.22% 0.5169
AblationCAM 0.5213 0.1314 41.81% 0.5050
XGradCAM 0.5191 0.1325 42.29% 0.5001
FullGrad 0.4940 0.2350 59.61% 0.5142
EigenGradCAM 0.5007 0.0762 41.81% 0.4913
LayerCAM 0.5225 0.1316 42.22% 0.5100
Configuration: num_iter=10, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.6393 0.3049 78.48% 0.6545
CAM 0.5248 0.1442 41.88% 0.5118
GradCAM 0.5248 0.1337 41.88% 0.5118
HiResCAM 0.5248 0.1337 41.88% 0.5118
GradCAM++ 0.5273 0.1310 42.02% 0.5181
ScoreCAM 0.5323 0.1709 41.27% 0.5320
AblationCAM 0.5277 0.1367 41.88% 0.5179
XGradCAM 0.5248 0.1337 41.88% 0.5118
FullGrad 0.4963 0.2409 60.62% 0.5128
EigenGradCAM 0.5062 0.0787 41.75% 0.5031
LayerCAM 0.5293 0.1366 41.68% 0.5235

CasCAM with EBayesThresh (5 configurations)

Configuration: num_iter=3, \(\theta\)=0.1, threshold=EBayesThresh, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.4449 0.0497 41.27% 0.3471
CAM 0.5247 0.1293 42.42% 0.5129
GradCAM 0.5248 0.1210 42.42% 0.5129
HiResCAM 0.5248 0.1210 42.42% 0.5129
GradCAM++ 0.5271 0.1173 41.75% 0.5198
ScoreCAM 0.5326 0.1491 42.22% 0.5330
AblationCAM 0.5281 0.1238 42.15% 0.5198
XGradCAM 0.5248 0.1210 42.42% 0.5129
FullGrad 0.5187 0.2679 64.95% 0.5426
EigenGradCAM 0.5011 0.0765 41.07% 0.4995
LayerCAM 0.5290 0.1214 42.22% 0.5245
Configuration: num_iter=3, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5163 0.0685 45.53% 0.5459
CAM 0.5217 0.1372 41.88% 0.5078
GradCAM 0.5218 0.1270 41.88% 0.5078
HiResCAM 0.5218 0.1270 41.88% 0.5078
GradCAM++ 0.5238 0.1246 42.15% 0.5136
ScoreCAM 0.5286 0.1586 41.95% 0.5267
AblationCAM 0.5242 0.1285 41.27% 0.5132
XGradCAM 0.5218 0.1270 41.88% 0.5078
FullGrad 0.4974 0.2291 58.25% 0.5201
EigenGradCAM 0.5022 0.0747 41.75% 0.4975
LayerCAM 0.5257 0.1282 41.75% 0.5182
Configuration: num_iter=5, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5646 0.1089 59.61% 0.6109
CAM 0.5144 0.1433 41.88% 0.4896
GradCAM 0.5145 0.1337 41.88% 0.4896
HiResCAM 0.5145 0.1337 41.88% 0.4896
GradCAM++ 0.5158 0.1344 42.08% 0.4945
ScoreCAM 0.5192 0.1539 42.49% 0.5065
AblationCAM 0.5161 0.1330 41.54% 0.4942
XGradCAM 0.5145 0.1337 41.88% 0.4896
FullGrad 0.4962 0.2513 60.62% 0.5156
EigenGradCAM 0.4954 0.0820 42.42% 0.4829
LayerCAM 0.5174 0.1354 42.08% 0.4994
Configuration: num_iter=10, \(\theta\)=0.1, threshold=EBayesThresh, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5505 0.0922 54.87% 0.5964
CAM 0.5223 0.1339 41.81% 0.5040
GradCAM 0.5224 0.1255 41.81% 0.5040
HiResCAM 0.5224 0.1255 41.81% 0.5040
GradCAM++ 0.5251 0.1235 41.88% 0.5110
ScoreCAM 0.5302 0.1485 42.08% 0.5230
AblationCAM 0.5253 0.1270 41.68% 0.5102
XGradCAM 0.5224 0.1255 41.81% 0.5040
FullGrad 0.5092 0.2306 56.97% 0.5314
EigenGradCAM 0.4962 0.0770 41.14% 0.4879
LayerCAM 0.5270 0.1275 41.68% 0.5158
Configuration: num_iter=10, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5951 0.1260 69.01% 0.6262
CAM 0.5226 0.1418 41.81% 0.5096
GradCAM 0.5227 0.1299 41.81% 0.5096
HiResCAM 0.5227 0.1299 41.81% 0.5096
GradCAM++ 0.5243 0.1286 42.02% 0.5157
ScoreCAM 0.5298 0.1541 41.61% 0.5289
AblationCAM 0.5249 0.1288 41.54% 0.5151
XGradCAM 0.5227 0.1299 41.81% 0.5096
FullGrad 0.4834 0.2301 58.53% 0.5009
EigenGradCAM 0.4932 0.0760 41.00% 0.4872
LayerCAM 0.5262 0.1297 41.88% 0.5207

CasCAM without Threshold (6 configurations)

Configuration: num_iter=3, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5312 0.1715 43.03% 0.5419
CAM 0.5222 0.1435 42.76% 0.5083
GradCAM 0.5223 0.1314 42.76% 0.5083
HiResCAM 0.5223 0.1314 42.76% 0.5083
GradCAM++ 0.5247 0.1309 42.42% 0.5147
ScoreCAM 0.5282 0.1563 42.49% 0.5247
AblationCAM 0.5246 0.1331 42.29% 0.5135
XGradCAM 0.5223 0.1314 42.76% 0.5083
FullGrad 0.4878 0.2348 62.52% 0.5056
EigenGradCAM 0.5004 0.0773 41.95% 0.4950
LayerCAM 0.5263 0.1343 42.35% 0.5189
Configuration: num_iter=3, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5050 0.1409 42.42% 0.5131
CAM 0.5031 0.1002 42.22% 0.4843
GradCAM 0.5031 0.0948 42.22% 0.4843
HiResCAM 0.5031 0.0948 42.22% 0.4843
GradCAM++ 0.5036 0.0946 42.29% 0.4904
ScoreCAM 0.5071 0.1003 41.88% 0.5021
AblationCAM 0.5041 0.0962 41.88% 0.4896
XGradCAM 0.5031 0.0948 42.22% 0.4843
FullGrad 0.4947 0.2115 55.35% 0.5219
EigenGradCAM 0.4705 0.0712 40.87% 0.4653
LayerCAM 0.5048 0.0945 42.22% 0.4954
Configuration: num_iter=5, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5266 0.1616 42.29% 0.5415
CAM 0.5204 0.1240 42.02% 0.5038
GradCAM 0.5205 0.1151 42.02% 0.5038
HiResCAM 0.5205 0.1151 42.02% 0.5038
GradCAM++ 0.5223 0.1152 42.02% 0.5097
ScoreCAM 0.5261 0.1389 42.35% 0.5212
AblationCAM 0.5227 0.1173 41.54% 0.5088
XGradCAM 0.5205 0.1151 42.02% 0.5038
FullGrad 0.5156 0.2542 63.19% 0.5407
EigenGradCAM 0.5003 0.0729 41.68% 0.4941
LayerCAM 0.5240 0.1174 42.02% 0.5143
Configuration: num_iter=5, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5232 0.1976 42.22% 0.5274
CAM 0.5226 0.1241 41.88% 0.5057
GradCAM 0.5227 0.1170 41.88% 0.5057
HiResCAM 0.5227 0.1170 41.88% 0.5057
GradCAM++ 0.5251 0.1129 42.35% 0.5126
ScoreCAM 0.5302 0.1403 42.08% 0.5253
AblationCAM 0.5252 0.1194 41.75% 0.5114
XGradCAM 0.5227 0.1170 41.88% 0.5057
FullGrad 0.4953 0.2342 62.25% 0.5161
EigenGradCAM 0.5016 0.0729 41.54% 0.4957
LayerCAM 0.5269 0.1174 41.95% 0.5170
Configuration: num_iter=10, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.5417 0.2136 46.14% 0.5756
CAM 0.5324 0.1586 43.10% 0.5335
GradCAM 0.5327 0.1481 43.10% 0.5335
HiResCAM 0.5327 0.1481 43.10% 0.5335
GradCAM++ 0.5341 0.1441 41.88% 0.5390
ScoreCAM 0.5402 0.1506 42.76% 0.5493
AblationCAM 0.5345 0.1426 42.29% 0.5380
XGradCAM 0.5327 0.1481 43.10% 0.5335
FullGrad 0.5275 0.2391 54.80% 0.5573
EigenGradCAM 0.4892 0.0904 40.32% 0.4992
LayerCAM 0.5354 0.1404 41.81% 0.5416
Configuration: num_iter=10, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
Method AP IoU Pointing Game Top-15%
CasCAM (Ours) 0.4973 0.2179 40.66% 0.4966
CAM 0.5257 0.1449 41.95% 0.5243
GradCAM 0.5257 0.1321 41.95% 0.5243
HiResCAM 0.5257 0.1321 41.95% 0.5243
GradCAM++ 0.5278 0.1295 42.35% 0.5311
ScoreCAM 0.5354 0.1641 42.42% 0.5448
AblationCAM 0.5281 0.1338 42.08% 0.5303
XGradCAM 0.5257 0.1321 41.95% 0.5243
FullGrad 0.5002 0.2402 61.43% 0.5188
EigenGradCAM 0.4774 0.0757 38.50% 0.4729
LayerCAM 0.5298 0.1327 42.22% 0.5357

Comparison Figures

Visual comparison of CasCAM against baseline methods. Each figure shows the original image with artifact text label ("Cat" or "Dog"), followed by CAM visualizations from 10 different methods.

CasCAM Configuration: num_iter=10, θ=0.3, top_k=10, λ=0.1 (Oxford-IIIT Pet) / num_iter=15, θ=0.1, top_k=10, λ=0.0 (MS-COCO)
1 / 50
Comparison Figure