Counterfactual Causal Attention Learning: Enhancing Fine-Grained Visual Recognition via Indirect Effect Optimization
DOI:
https://doi.org/10.61173/h9ah2p88Keywords:
fine-grained visual recognition, attention mechanism, counterfactual attention learning, causal inference, indirect EffectAbstract
Fine-grained visual recognition (FGVR) aims to distinguish subtle differences among visually similar categories. However, conventional attention mechanisms lack quantitative approaches to evaluate the quality of the learned attention during training, which limits their effectiveness. To address this limitation, we propose a novel Counterfactual Causal Attention Learning (CCAL) framework for fine-grained image classification and person re-identification. In our approach, the attention map is modeled as a confounding variable within a causal graph, and counterfactual interventions are employed to assess its impact on model predictions. By optimizing the indirect effect (IE), CCAL enhances the reliability of attention and improves overall recognition performance. Extensive experiments on multiple FGVR benchmarks demonstrate consistent improvements, including a 1.3% Top-1 accuracy gain on the CUB-200-2011 dataset.