Performance Metrics
We evaluate CasCAM on the Oxford-IIIT Pet dataset, which contains 7,349 images across 37 pet categories with pixel-level segmentation annotations. This dataset provides an ideal benchmark for evaluating localization accuracy due to its clean single-object images with precise ground truth masks.
Oxford-IIIT Pet Dataset Results
Best Configuration: num_iter=10, θ=0.3, top_k=10, λ=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.6529 |
0.2366 |
89.24% |
0.6499 |
| CAM |
0.5190 |
0.1432 |
42.29% |
0.5001 |
| GradCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| HiResCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| GradCAM++ |
0.5207 |
0.1287 |
42.02% |
0.5054 |
| ScoreCAM |
0.5243 |
0.1524 |
42.22% |
0.5169 |
| AblationCAM |
0.5213 |
0.1314 |
41.81% |
0.5050 |
| XGradCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| FullGrad |
0.4940 |
0.2350 |
59.61% |
0.5142 |
| EigenGradCAM |
0.5007 |
0.0762 |
41.81% |
0.4913 |
| LayerCAM |
0.5225 |
0.1316 |
42.22% |
0.5100 |
Key Finding: CasCAM achieves 89.24% Pointing Game accuracy, a +29.63%p improvement over FullGrad (59.61%). For IoU, CasCAM achieves 0.2366, comparable to FullGrad (0.2350) while significantly outperforming all other methods.
Generalization to MS-COCO Dataset
To validate generalization, we also evaluate CasCAM on the more challenging MS-COCO dataset with 1,728 images containing multiple objects and complex backgrounds.
Configuration: num_iter=15, θ=0.1, top_k=10, λ=0.0
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5417 |
0.2050 |
75.98% |
0.3817 |
| CAM |
0.2788 |
0.1143 |
20.43% |
0.2225 |
| GradCAM |
0.2792 |
0.1078 |
20.43% |
0.2225 |
| HiResCAM |
0.2792 |
0.1078 |
20.43% |
0.2225 |
| GradCAM++ |
0.2844 |
0.1038 |
21.41% |
0.2250 |
| ScoreCAM |
0.2842 |
0.1164 |
20.43% |
0.2261 |
| AblationCAM |
0.2827 |
0.1049 |
20.83% |
0.2247 |
| XGradCAM |
0.2792 |
0.1078 |
20.43% |
0.2225 |
| FullGrad |
0.4223 |
0.2368 |
44.44% |
0.3757 |
| EigenGradCAM |
0.2658 |
0.0756 |
20.78% |
0.2230 |
| LayerCAM |
0.2852 |
0.1085 |
21.01% |
0.2253 |
Key Finding: CasCAM achieves 75.98% Pointing Game on MS-COCO, a +31.54%p improvement over FullGrad (44.44%). This demonstrates that CasCAM's iterative refinement effectively handles complex multi-object scenes.
Summary Comparison
| Dataset |
Method |
AP |
Pointing Game |
IoU |
| Oxford-IIIT Pet |
CasCAM |
0.6529 |
89.24% |
0.2366 |
| Oxford-IIIT Pet |
FullGrad |
0.4940 |
59.61% |
0.2350 |
| MS-COCO |
CasCAM |
0.5417 |
75.98% |
0.2050 |
| MS-COCO |
FullGrad |
0.4223 |
44.44% |
0.2368 |
Oxford-IIIT Pet: CasCAM Hyperparameter Analysis
We conducted 23 experiments on the Oxford-IIIT Pet dataset with different hyperparameter configurations to identify optimal settings. The Top-K threshold method consistently outperforms other threshold methods across all metrics.
CasCAM with Top-K Threshold
| num_iter |
\(\theta\) |
top_k |
AP |
IoU |
Pointing Game |
Top-15% |
| 3 |
0.1 |
10 |
0.5032 |
0.0581 |
42.83% |
0.4823 |
| 3 |
0.1 |
20 |
0.5165 |
0.1735 |
44.38% |
0.5524 |
| 3 |
0.3 |
10 |
0.5447 |
0.1661 |
65.02% |
0.5756 |
| 3 |
0.3 |
20 |
0.5602 |
0.2312 |
55.68% |
0.5806 |
| 5 |
0.1 |
10 |
0.5517 |
0.1669 |
63.40% |
0.5913 |
| 5 |
0.1 |
20 |
0.5472 |
0.2236 |
48.17% |
0.5803 |
| 5 |
0.3 |
10 |
0.5820 |
0.2090 |
78.82% |
0.5926 |
| 5 |
0.3 |
20 |
0.6078 |
0.2755 |
71.85% |
0.6193 |
| 10 |
0.1 |
10 |
0.5926 |
0.2067 |
81.33% |
0.6095 |
| 10 |
0.1 |
20 |
0.5826 |
0.2608 |
61.43% |
0.6094 |
| 10 |
0.3 |
10 |
0.6529 |
0.2366 |
89.24% |
0.6499 |
| 10 |
0.3 |
20 |
0.6393 |
0.3049 |
78.48% |
0.6545 |
The configuration with \(\text{top\_k}=10\) achieves the best Pointing Game accuracy (89.24%), while \(\text{top\_k}=20\) achieves the best IoU (0.3049). The choice between these configurations depends on the specific optimization target.
CasCAM with EBayesThresh
Empirical Bayes thresholding provides adaptive, data-driven threshold selection.
| num_iter |
\(\theta\) |
AP |
IoU |
Pointing Game |
Top-15% |
| 3 |
0.1 |
0.4449 |
0.0497 |
41.27% |
0.3471 |
| 3 |
0.3 |
0.5163 |
0.0685 |
45.53% |
0.5459 |
| 5 |
0.3 |
0.5646 |
0.1089 |
59.61% |
0.6109 |
| 10 |
0.1 |
0.5505 |
0.0922 |
54.87% |
0.5964 |
| 10 |
0.3 |
0.5951 |
0.1260 |
69.01% |
0.6262 |
EBayesThresh underperforms compared to Top-K threshold, but still achieves 69.01% Pointing Game accuracy, which outperforms most baseline methods.
CasCAM without Threshold
For comparison, we also evaluate CasCAM without explicit thresholding, using raw CAM values.
| num_iter |
\(\theta\) |
AP |
IoU |
Pointing Game |
Top-15% |
| 3 |
0.1 |
0.5312 |
0.1715 |
43.03% |
0.5419 |
| 3 |
0.3 |
0.5050 |
0.1409 |
42.42% |
0.5131 |
| 5 |
0.1 |
0.5266 |
0.1616 |
42.29% |
0.5415 |
| 5 |
0.3 |
0.5232 |
0.1976 |
42.22% |
0.5274 |
| 10 |
0.1 |
0.5417 |
0.2136 |
46.14% |
0.5756 |
| 10 |
0.3 |
0.4973 |
0.2179 |
40.66% |
0.4966 |
Without thresholding, CasCAM fails to outperform baseline methods, demonstrating that the threshold method is essential for CasCAM's effectiveness.
Best Configurations Summary
| Optimization Target |
Best Configuration |
Value |
Improvement vs. Best Baseline |
| Pointing Game |
iter=10, \(\theta\)=0.3, top_k=10 |
89.24% |
+29.63%p vs FullGrad (59.61%) |
| IoU |
iter=10, \(\theta\)=0.3, top_k=20 |
0.3049 |
+29.7% vs FullGrad (0.2350) |
| AP |
iter=10, \(\theta\)=0.3, top_k=10 |
0.6529 |
+24.5% vs ScoreCAM (0.5243) |
| Top-15% Energy |
iter=10, \(\theta\)=0.3, top_k=20 |
0.6545 |
+26.6% vs ScoreCAM (0.5169) |
All Oxford-IIIT Pet Experiment Results (Reference)
The following tables present complete results for all 23 experimental configurations on the Oxford-IIIT Pet dataset. Each table shows CasCAM performance compared against all baseline methods under a specific hyperparameter setting. These results serve as a comprehensive reference for reproducibility and detailed analysis.
CasCAM with Top-K Threshold (12 configurations)
Configuration: num_iter=3, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5032 |
0.0581 |
42.83% |
0.4823 |
| CAM |
0.5168 |
0.1370 |
41.88% |
0.4961 |
| GradCAM |
0.5170 |
0.1277 |
41.88% |
0.4961 |
| HiResCAM |
0.5170 |
0.1277 |
41.88% |
0.4961 |
| GradCAM++ |
0.5186 |
0.1267 |
42.29% |
0.5010 |
| ScoreCAM |
0.5223 |
0.1487 |
41.95% |
0.5125 |
| AblationCAM |
0.5189 |
0.1293 |
41.54% |
0.5007 |
| XGradCAM |
0.5170 |
0.1277 |
41.88% |
0.4961 |
| FullGrad |
0.4932 |
0.2350 |
58.39% |
0.5147 |
| EigenGradCAM |
0.4975 |
0.0757 |
41.88% |
0.4879 |
| LayerCAM |
0.5203 |
0.1310 |
41.61% |
0.5056 |
Configuration: num_iter=3, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5165 |
0.1735 |
44.38% |
0.5524 |
| CAM |
0.5196 |
0.1486 |
42.15% |
0.5020 |
| GradCAM |
0.5197 |
0.1382 |
42.15% |
0.5020 |
| HiResCAM |
0.5197 |
0.1382 |
42.15% |
0.5020 |
| GradCAM++ |
0.5221 |
0.1369 |
42.49% |
0.5082 |
| ScoreCAM |
0.5272 |
0.1667 |
42.63% |
0.5197 |
| AblationCAM |
0.5227 |
0.1404 |
41.75% |
0.5075 |
| XGradCAM |
0.5197 |
0.1382 |
42.15% |
0.5020 |
| FullGrad |
0.5104 |
0.2476 |
62.18% |
0.5324 |
| EigenGradCAM |
0.4992 |
0.0849 |
42.15% |
0.4926 |
| LayerCAM |
0.5240 |
0.1425 |
42.69% |
0.5124 |
Configuration: num_iter=3, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5447 |
0.1661 |
65.02% |
0.5756 |
| CAM |
0.5151 |
0.1366 |
42.02% |
0.4937 |
| GradCAM |
0.5153 |
0.1299 |
42.02% |
0.4937 |
| HiResCAM |
0.5153 |
0.1299 |
42.02% |
0.4937 |
| GradCAM++ |
0.5177 |
0.1287 |
41.61% |
0.5008 |
| ScoreCAM |
0.5220 |
0.1477 |
42.08% |
0.5112 |
| AblationCAM |
0.5181 |
0.1322 |
42.02% |
0.4999 |
| XGradCAM |
0.5153 |
0.1299 |
42.02% |
0.4937 |
| FullGrad |
0.5265 |
0.2548 |
58.39% |
0.5576 |
| EigenGradCAM |
0.4951 |
0.0856 |
41.47% |
0.4884 |
| LayerCAM |
0.5196 |
0.1352 |
41.75% |
0.5057 |
Configuration: num_iter=3, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5602 |
0.2312 |
55.68% |
0.5806 |
| CAM |
0.5252 |
0.1563 |
42.29% |
0.5149 |
| GradCAM |
0.5254 |
0.1428 |
42.29% |
0.5149 |
| HiResCAM |
0.5254 |
0.1428 |
42.29% |
0.5149 |
| GradCAM++ |
0.5276 |
0.1436 |
42.02% |
0.5206 |
| ScoreCAM |
0.5311 |
0.1672 |
42.35% |
0.5297 |
| AblationCAM |
0.5280 |
0.1441 |
41.88% |
0.5203 |
| XGradCAM |
0.5254 |
0.1428 |
42.29% |
0.5149 |
| FullGrad |
0.5150 |
0.2464 |
60.69% |
0.5402 |
| EigenGradCAM |
0.5063 |
0.0830 |
42.35% |
0.5052 |
| LayerCAM |
0.5295 |
0.1484 |
42.29% |
0.5248 |
Configuration: num_iter=5, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5517 |
0.1669 |
63.40% |
0.5913 |
| CAM |
0.5200 |
0.1448 |
42.22% |
0.5038 |
| GradCAM |
0.5202 |
0.1331 |
42.22% |
0.5038 |
| HiResCAM |
0.5202 |
0.1331 |
42.22% |
0.5038 |
| GradCAM++ |
0.5222 |
0.1317 |
42.35% |
0.5090 |
| ScoreCAM |
0.5263 |
0.1631 |
42.22% |
0.5202 |
| AblationCAM |
0.5227 |
0.1344 |
41.75% |
0.5090 |
| XGradCAM |
0.5202 |
0.1331 |
42.22% |
0.5038 |
| FullGrad |
0.5022 |
0.2483 |
64.55% |
0.5208 |
| EigenGradCAM |
0.5019 |
0.0785 |
42.08% |
0.4943 |
| LayerCAM |
0.5242 |
0.1364 |
42.29% |
0.5137 |
Configuration: num_iter=5, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5472 |
0.2236 |
48.17% |
0.5803 |
| CAM |
0.5222 |
0.1330 |
42.08% |
0.5117 |
| GradCAM |
0.5222 |
0.1206 |
42.08% |
0.5117 |
| HiResCAM |
0.5222 |
0.1206 |
42.08% |
0.5117 |
| GradCAM++ |
0.5242 |
0.1177 |
42.29% |
0.5176 |
| ScoreCAM |
0.5298 |
0.1543 |
42.15% |
0.5311 |
| AblationCAM |
0.5250 |
0.1228 |
41.81% |
0.5178 |
| XGradCAM |
0.5222 |
0.1206 |
42.08% |
0.5117 |
| FullGrad |
0.4892 |
0.2342 |
63.60% |
0.5027 |
| EigenGradCAM |
0.4861 |
0.0696 |
39.72% |
0.4792 |
| LayerCAM |
0.5260 |
0.1209 |
42.29% |
0.5225 |
Configuration: num_iter=5, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5820 |
0.2090 |
78.82% |
0.5926 |
| CAM |
0.5207 |
0.1425 |
42.22% |
0.5049 |
| GradCAM |
0.5208 |
0.1317 |
42.22% |
0.5049 |
| HiResCAM |
0.5208 |
0.1317 |
42.22% |
0.5049 |
| GradCAM++ |
0.5226 |
0.1308 |
42.35% |
0.5102 |
| ScoreCAM |
0.5278 |
0.1663 |
42.29% |
0.5243 |
| AblationCAM |
0.5230 |
0.1338 |
41.75% |
0.5099 |
| XGradCAM |
0.5208 |
0.1317 |
42.22% |
0.5049 |
| FullGrad |
0.5004 |
0.2425 |
63.33% |
0.5198 |
| EigenGradCAM |
0.5030 |
0.0787 |
42.22% |
0.4965 |
| LayerCAM |
0.5246 |
0.1353 |
42.29% |
0.5152 |
Configuration: num_iter=5, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.6078 |
0.2755 |
71.85% |
0.6193 |
| CAM |
0.5228 |
0.1519 |
42.42% |
0.5060 |
| GradCAM |
0.5230 |
0.1397 |
42.42% |
0.5060 |
| HiResCAM |
0.5230 |
0.1397 |
42.42% |
0.5060 |
| GradCAM++ |
0.5256 |
0.1385 |
42.29% |
0.5125 |
| ScoreCAM |
0.5296 |
0.1689 |
42.15% |
0.5239 |
| AblationCAM |
0.5258 |
0.1411 |
41.81% |
0.5116 |
| XGradCAM |
0.5230 |
0.1397 |
42.42% |
0.5060 |
| FullGrad |
0.4992 |
0.2441 |
60.28% |
0.5172 |
| EigenGradCAM |
0.5060 |
0.0799 |
41.88% |
0.4985 |
| LayerCAM |
0.5274 |
0.1434 |
42.29% |
0.5170 |
Configuration: num_iter=10, \(\theta\)=0.1, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5926 |
0.2067 |
81.33% |
0.6095 |
| CAM |
0.5189 |
0.1497 |
42.02% |
0.4983 |
| GradCAM |
0.5190 |
0.1400 |
42.02% |
0.4983 |
| HiResCAM |
0.5190 |
0.1400 |
42.02% |
0.4983 |
| GradCAM++ |
0.5210 |
0.1353 |
42.49% |
0.5041 |
| ScoreCAM |
0.5253 |
0.1553 |
42.02% |
0.5154 |
| AblationCAM |
0.5214 |
0.1403 |
41.81% |
0.5032 |
| XGradCAM |
0.5190 |
0.1400 |
42.02% |
0.4983 |
| FullGrad |
0.4973 |
0.2371 |
58.59% |
0.5193 |
| EigenGradCAM |
0.4966 |
0.0830 |
41.47% |
0.4854 |
| LayerCAM |
0.5225 |
0.1397 |
42.22% |
0.5082 |
Configuration: num_iter=10, \(\theta\)=0.1, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5826 |
0.2608 |
61.43% |
0.6094 |
| CAM |
0.5229 |
0.1607 |
42.02% |
0.5060 |
| GradCAM |
0.5230 |
0.1479 |
42.02% |
0.5060 |
| HiResCAM |
0.5230 |
0.1479 |
42.02% |
0.5060 |
| GradCAM++ |
0.5252 |
0.1462 |
42.08% |
0.5119 |
| ScoreCAM |
0.5289 |
0.1784 |
42.42% |
0.5230 |
| AblationCAM |
0.5254 |
0.1491 |
41.81% |
0.5110 |
| XGradCAM |
0.5230 |
0.1479 |
42.02% |
0.5060 |
| FullGrad |
0.4919 |
0.2431 |
61.57% |
0.5093 |
| EigenGradCAM |
0.5078 |
0.0818 |
42.35% |
0.4998 |
| LayerCAM |
0.5270 |
0.1524 |
42.15% |
0.5167 |
Configuration: num_iter=10, \(\theta\)=0.3, top_k=10, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.6529 |
0.2366 |
89.24% |
0.6499 |
| CAM |
0.5190 |
0.1432 |
42.29% |
0.5001 |
| GradCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| HiResCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| GradCAM++ |
0.5207 |
0.1287 |
42.02% |
0.5054 |
| ScoreCAM |
0.5243 |
0.1524 |
42.22% |
0.5169 |
| AblationCAM |
0.5213 |
0.1314 |
41.81% |
0.5050 |
| XGradCAM |
0.5191 |
0.1325 |
42.29% |
0.5001 |
| FullGrad |
0.4940 |
0.2350 |
59.61% |
0.5142 |
| EigenGradCAM |
0.5007 |
0.0762 |
41.81% |
0.4913 |
| LayerCAM |
0.5225 |
0.1316 |
42.22% |
0.5100 |
Configuration: num_iter=10, \(\theta\)=0.3, top_k=20, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.6393 |
0.3049 |
78.48% |
0.6545 |
| CAM |
0.5248 |
0.1442 |
41.88% |
0.5118 |
| GradCAM |
0.5248 |
0.1337 |
41.88% |
0.5118 |
| HiResCAM |
0.5248 |
0.1337 |
41.88% |
0.5118 |
| GradCAM++ |
0.5273 |
0.1310 |
42.02% |
0.5181 |
| ScoreCAM |
0.5323 |
0.1709 |
41.27% |
0.5320 |
| AblationCAM |
0.5277 |
0.1367 |
41.88% |
0.5179 |
| XGradCAM |
0.5248 |
0.1337 |
41.88% |
0.5118 |
| FullGrad |
0.4963 |
0.2409 |
60.62% |
0.5128 |
| EigenGradCAM |
0.5062 |
0.0787 |
41.75% |
0.5031 |
| LayerCAM |
0.5293 |
0.1366 |
41.68% |
0.5235 |
CasCAM with EBayesThresh (5 configurations)
Configuration: num_iter=3, \(\theta\)=0.1, threshold=EBayesThresh, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.4449 |
0.0497 |
41.27% |
0.3471 |
| CAM |
0.5247 |
0.1293 |
42.42% |
0.5129 |
| GradCAM |
0.5248 |
0.1210 |
42.42% |
0.5129 |
| HiResCAM |
0.5248 |
0.1210 |
42.42% |
0.5129 |
| GradCAM++ |
0.5271 |
0.1173 |
41.75% |
0.5198 |
| ScoreCAM |
0.5326 |
0.1491 |
42.22% |
0.5330 |
| AblationCAM |
0.5281 |
0.1238 |
42.15% |
0.5198 |
| XGradCAM |
0.5248 |
0.1210 |
42.42% |
0.5129 |
| FullGrad |
0.5187 |
0.2679 |
64.95% |
0.5426 |
| EigenGradCAM |
0.5011 |
0.0765 |
41.07% |
0.4995 |
| LayerCAM |
0.5290 |
0.1214 |
42.22% |
0.5245 |
Configuration: num_iter=3, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5163 |
0.0685 |
45.53% |
0.5459 |
| CAM |
0.5217 |
0.1372 |
41.88% |
0.5078 |
| GradCAM |
0.5218 |
0.1270 |
41.88% |
0.5078 |
| HiResCAM |
0.5218 |
0.1270 |
41.88% |
0.5078 |
| GradCAM++ |
0.5238 |
0.1246 |
42.15% |
0.5136 |
| ScoreCAM |
0.5286 |
0.1586 |
41.95% |
0.5267 |
| AblationCAM |
0.5242 |
0.1285 |
41.27% |
0.5132 |
| XGradCAM |
0.5218 |
0.1270 |
41.88% |
0.5078 |
| FullGrad |
0.4974 |
0.2291 |
58.25% |
0.5201 |
| EigenGradCAM |
0.5022 |
0.0747 |
41.75% |
0.4975 |
| LayerCAM |
0.5257 |
0.1282 |
41.75% |
0.5182 |
Configuration: num_iter=5, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5646 |
0.1089 |
59.61% |
0.6109 |
| CAM |
0.5144 |
0.1433 |
41.88% |
0.4896 |
| GradCAM |
0.5145 |
0.1337 |
41.88% |
0.4896 |
| HiResCAM |
0.5145 |
0.1337 |
41.88% |
0.4896 |
| GradCAM++ |
0.5158 |
0.1344 |
42.08% |
0.4945 |
| ScoreCAM |
0.5192 |
0.1539 |
42.49% |
0.5065 |
| AblationCAM |
0.5161 |
0.1330 |
41.54% |
0.4942 |
| XGradCAM |
0.5145 |
0.1337 |
41.88% |
0.4896 |
| FullGrad |
0.4962 |
0.2513 |
60.62% |
0.5156 |
| EigenGradCAM |
0.4954 |
0.0820 |
42.42% |
0.4829 |
| LayerCAM |
0.5174 |
0.1354 |
42.08% |
0.4994 |
Configuration: num_iter=10, \(\theta\)=0.1, threshold=EBayesThresh, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5505 |
0.0922 |
54.87% |
0.5964 |
| CAM |
0.5223 |
0.1339 |
41.81% |
0.5040 |
| GradCAM |
0.5224 |
0.1255 |
41.81% |
0.5040 |
| HiResCAM |
0.5224 |
0.1255 |
41.81% |
0.5040 |
| GradCAM++ |
0.5251 |
0.1235 |
41.88% |
0.5110 |
| ScoreCAM |
0.5302 |
0.1485 |
42.08% |
0.5230 |
| AblationCAM |
0.5253 |
0.1270 |
41.68% |
0.5102 |
| XGradCAM |
0.5224 |
0.1255 |
41.81% |
0.5040 |
| FullGrad |
0.5092 |
0.2306 |
56.97% |
0.5314 |
| EigenGradCAM |
0.4962 |
0.0770 |
41.14% |
0.4879 |
| LayerCAM |
0.5270 |
0.1275 |
41.68% |
0.5158 |
Configuration: num_iter=10, \(\theta\)=0.3, threshold=EBayesThresh, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5951 |
0.1260 |
69.01% |
0.6262 |
| CAM |
0.5226 |
0.1418 |
41.81% |
0.5096 |
| GradCAM |
0.5227 |
0.1299 |
41.81% |
0.5096 |
| HiResCAM |
0.5227 |
0.1299 |
41.81% |
0.5096 |
| GradCAM++ |
0.5243 |
0.1286 |
42.02% |
0.5157 |
| ScoreCAM |
0.5298 |
0.1541 |
41.61% |
0.5289 |
| AblationCAM |
0.5249 |
0.1288 |
41.54% |
0.5151 |
| XGradCAM |
0.5227 |
0.1299 |
41.81% |
0.5096 |
| FullGrad |
0.4834 |
0.2301 |
58.53% |
0.5009 |
| EigenGradCAM |
0.4932 |
0.0760 |
41.00% |
0.4872 |
| LayerCAM |
0.5262 |
0.1297 |
41.88% |
0.5207 |
CasCAM without Threshold (6 configurations)
Configuration: num_iter=3, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5312 |
0.1715 |
43.03% |
0.5419 |
| CAM |
0.5222 |
0.1435 |
42.76% |
0.5083 |
| GradCAM |
0.5223 |
0.1314 |
42.76% |
0.5083 |
| HiResCAM |
0.5223 |
0.1314 |
42.76% |
0.5083 |
| GradCAM++ |
0.5247 |
0.1309 |
42.42% |
0.5147 |
| ScoreCAM |
0.5282 |
0.1563 |
42.49% |
0.5247 |
| AblationCAM |
0.5246 |
0.1331 |
42.29% |
0.5135 |
| XGradCAM |
0.5223 |
0.1314 |
42.76% |
0.5083 |
| FullGrad |
0.4878 |
0.2348 |
62.52% |
0.5056 |
| EigenGradCAM |
0.5004 |
0.0773 |
41.95% |
0.4950 |
| LayerCAM |
0.5263 |
0.1343 |
42.35% |
0.5189 |
Configuration: num_iter=3, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5050 |
0.1409 |
42.42% |
0.5131 |
| CAM |
0.5031 |
0.1002 |
42.22% |
0.4843 |
| GradCAM |
0.5031 |
0.0948 |
42.22% |
0.4843 |
| HiResCAM |
0.5031 |
0.0948 |
42.22% |
0.4843 |
| GradCAM++ |
0.5036 |
0.0946 |
42.29% |
0.4904 |
| ScoreCAM |
0.5071 |
0.1003 |
41.88% |
0.5021 |
| AblationCAM |
0.5041 |
0.0962 |
41.88% |
0.4896 |
| XGradCAM |
0.5031 |
0.0948 |
42.22% |
0.4843 |
| FullGrad |
0.4947 |
0.2115 |
55.35% |
0.5219 |
| EigenGradCAM |
0.4705 |
0.0712 |
40.87% |
0.4653 |
| LayerCAM |
0.5048 |
0.0945 |
42.22% |
0.4954 |
Configuration: num_iter=5, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5266 |
0.1616 |
42.29% |
0.5415 |
| CAM |
0.5204 |
0.1240 |
42.02% |
0.5038 |
| GradCAM |
0.5205 |
0.1151 |
42.02% |
0.5038 |
| HiResCAM |
0.5205 |
0.1151 |
42.02% |
0.5038 |
| GradCAM++ |
0.5223 |
0.1152 |
42.02% |
0.5097 |
| ScoreCAM |
0.5261 |
0.1389 |
42.35% |
0.5212 |
| AblationCAM |
0.5227 |
0.1173 |
41.54% |
0.5088 |
| XGradCAM |
0.5205 |
0.1151 |
42.02% |
0.5038 |
| FullGrad |
0.5156 |
0.2542 |
63.19% |
0.5407 |
| EigenGradCAM |
0.5003 |
0.0729 |
41.68% |
0.4941 |
| LayerCAM |
0.5240 |
0.1174 |
42.02% |
0.5143 |
Configuration: num_iter=5, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5232 |
0.1976 |
42.22% |
0.5274 |
| CAM |
0.5226 |
0.1241 |
41.88% |
0.5057 |
| GradCAM |
0.5227 |
0.1170 |
41.88% |
0.5057 |
| HiResCAM |
0.5227 |
0.1170 |
41.88% |
0.5057 |
| GradCAM++ |
0.5251 |
0.1129 |
42.35% |
0.5126 |
| ScoreCAM |
0.5302 |
0.1403 |
42.08% |
0.5253 |
| AblationCAM |
0.5252 |
0.1194 |
41.75% |
0.5114 |
| XGradCAM |
0.5227 |
0.1170 |
41.88% |
0.5057 |
| FullGrad |
0.4953 |
0.2342 |
62.25% |
0.5161 |
| EigenGradCAM |
0.5016 |
0.0729 |
41.54% |
0.4957 |
| LayerCAM |
0.5269 |
0.1174 |
41.95% |
0.5170 |
Configuration: num_iter=10, \(\theta\)=0.1, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.5417 |
0.2136 |
46.14% |
0.5756 |
| CAM |
0.5324 |
0.1586 |
43.10% |
0.5335 |
| GradCAM |
0.5327 |
0.1481 |
43.10% |
0.5335 |
| HiResCAM |
0.5327 |
0.1481 |
43.10% |
0.5335 |
| GradCAM++ |
0.5341 |
0.1441 |
41.88% |
0.5390 |
| ScoreCAM |
0.5402 |
0.1506 |
42.76% |
0.5493 |
| AblationCAM |
0.5345 |
0.1426 |
42.29% |
0.5380 |
| XGradCAM |
0.5327 |
0.1481 |
43.10% |
0.5335 |
| FullGrad |
0.5275 |
0.2391 |
54.80% |
0.5573 |
| EigenGradCAM |
0.4892 |
0.0904 |
40.32% |
0.4992 |
| LayerCAM |
0.5354 |
0.1404 |
41.81% |
0.5416 |
Configuration: num_iter=10, \(\theta\)=0.3, threshold=None, \(\lambda\)=0.1
| Method |
AP |
IoU |
Pointing Game |
Top-15% |
| CasCAM (Ours) |
0.4973 |
0.2179 |
40.66% |
0.4966 |
| CAM |
0.5257 |
0.1449 |
41.95% |
0.5243 |
| GradCAM |
0.5257 |
0.1321 |
41.95% |
0.5243 |
| HiResCAM |
0.5257 |
0.1321 |
41.95% |
0.5243 |
| GradCAM++ |
0.5278 |
0.1295 |
42.35% |
0.5311 |
| ScoreCAM |
0.5354 |
0.1641 |
42.42% |
0.5448 |
| AblationCAM |
0.5281 |
0.1338 |
42.08% |
0.5303 |
| XGradCAM |
0.5257 |
0.1321 |
41.95% |
0.5243 |
| FullGrad |
0.5002 |
0.2402 |
61.43% |
0.5188 |
| EigenGradCAM |
0.4774 |
0.0757 |
38.50% |
0.4729 |
| LayerCAM |
0.5298 |
0.1327 |
42.22% |
0.5357 |