注意力集合表示的多尺度度量小样本图像分类
Attention set representation for multiscale measurement of few-shot image classification
- 2024年29卷第11期 页码:3371-3382
纸质出版日期: 2024-11-16
DOI: 10.11834/jig.230763
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-11-16 ,
移动端阅览
王雪松, 吕理想, 程玉虎, 王浩宇. 2024. 注意力集合表示的多尺度度量小样本图像分类. 中国图象图形学报, 29(11):3371-3382
Wang Xuesong, Lyu Lixiang, Cheng Yuhu, Wang Haoyu. 2024. Attention set representation for multiscale measurement of few-shot image classification. Journal of Image and Graphics, 29(11):3371-3382
目的
2
在图像分类中,通常先用深度网络提取特征,再基于这些特征进行分类,小样本图像分类也遵循此原则。但在特征提取为向量的过程中,信息丢失是一个常见问题,这可能导致模型遗漏关键的类别信息。为构建更丰富、更全面的特征表示,提出了基于基类的丰富表示特征提取器(rich representation feature extractor, RireFeat)。
方法
2
RireFeat通过在特征提取网络中构建不同层级间的基于注意力机制的信息流通渠道,使得被忽略的类别强相关信息重新出现在新提取的特征表示中,从而根据重要性有效地利用图像信息以构建全面的特征表示。同时,为了增强模型的判别能力,从多个尺度对特征进行度量,构建基于对比学习和深度布朗距离协方差的损失函数,拉近类别强相关特征向量之间的距离,同时使不同类别特征向量距离更远。
结果
2
为了验证所提特征提取器的有效性,在标准的小样本数据集MiniImagenet、TierdeImageNet和CUB(caltech-ucsd birds-200-2011)上进行了1-shot和5-shot的分类训练。实验结果显示,在MiniImageNet数据集上RireFeat在基于卷积的骨干网络中于1-shot和5-shot情况下分别比集合特征提取器(set-feature extractor, SetFeat)取得精度高出0.64%和1.10%。基于ResNet12(residual network)的结构中于1-shot和5-shot情况下分别比SetFeat精度高出1.51%和1.46%。CUB数据集在基于卷积的骨干网络中分别于1-shot和5-shot情况下提供比SetFeat高0.03%和0.61%的增益。在基于ResNet12的结构中于1-shot和5-shot情况下比SetFeat精度提高了0.66%和0.75%。在TieredImageNet评估中,基于卷积的骨干网络结构中于1-shot和5-shot情况下比SetFeat精度提高了0.21%和0.38%。
结论
2
所提出的RireFeat特征提取器能够有效地提高模型的分类性能,并且具有很好的泛化能力。
Objective
2
The task of image classification based on few-shot learning refers to the training of a machine learning model that can effectively classify target images in the presence of limited target training samples available. The main challenge in few-shot image classification lies in the lack of a sufficient dataset, that is, only a small amount of labeled data is available for model training. Numerous advanced models have been proposed to tackle this challenge. A common and efficient strategy is to use deep networks as feature extractors. Deep networks are models that can automatically extract valuable features from input images. These networks can extract feature vectors from the image by using multilayer convolution and pooling operations. These feature vectors can be used to determine the category of the images and realize the goal of image classification. During model training, the feature extractor gradually learns to extract relevant information related to the category of the image, which can then be used as the feature vector. Using deep networks as feature extractors is a common and efficient strategy for few-shot image classification. Even when trained on limited labeled data, these models can achieve high accuracy by leveraging the power of deep learning. However, in the process of extracting features in the form of vectors, a risk of losing valuable information, including information strongly associated with the specific category, is evident. This risk can result in the disregard of crucial information that could substantially enhance image classification accuracy. The extracted feature vectors must encompass a maximum amount of category-specific information to enhance the accuracy of classification. This paper introduces a novel rich representation feature extractor (RireFeat) based on the base class to achieve an extensive and comprehensive image representation.
Method
2
This paper proposes a feature extractor called RireFeat to achieve highly comprehensive and class-specific feature extraction. RireFeat mainly aims to enhance the exchange and flow of information within the feature extractor, thereby facilitating the extraction of class-related features. Additionally, this method focuses on the multilayer feature vectors before and after the training of the feature extractor to ensure that the positive information for classification is retained during the feature extraction process. RireFeat employs a pyramid-like design that divides the feature extractor into multiple levels. Each level will receive the image coding information from its upper level, and the obtained information will flow to the next level after several convolution and pooling operations at this level. This hierarchical structure facilitates the transfer and fusion of information between different levels, maximizing the utilization of image extraction information within the feature extractor. The category correlation of feature vectors is subsequently deepened, leading to improved accuracy in image classification. Furthermore, RireFeat demonstrates superior generalization capabilities and can readily adapt to novel image classification tasks. Specifically, this paper starts with the process of feature extraction. Local features related to categories are extracted after the image information traverses a multilayered hierarchical structure, while information unrelated to categories is ignored. However, this process may also lead to the removal of certain category-specific information. The rich representation feature extractor (RireFeat), which integrates a small shaping module to add the shaping module at a distance across the hierarchy, is proposed in this paper to address this issue. Therefore, image information can still flow and merge with each other after crossing the hierarchy. This design enables the network to pay additional attention to changes in features before and after each level, facilitating the effective extraction of local features while disregarding information that is unrelated to the specific category. Consequently, this approach notably enhances the classification accuracy. Simultaneously, this paper also introduces the idea of contrastive learning into few-shot image classification and combines it with deep Brownian distance covariance to measure image features from multiple scales to contrastive loss functions. This method aims to bring the embeddings of the same distribution closer while pushing those of different distributions farther away, thereby improving classification accuracy. In the experiment, the SetFeat method was used to extract the feature set for each image. In terms of training, similar to other few-shot image learning methods, the entire network is initially pre-trained and then finetuned in the meta-training stage. In the meta-training phase, the classification is performed by calculating the distance between the query (test) and support (training) sample sets.
Result
2
1-shot and 5-shot classification training are conducted on the standard small sample datasets, such as MiniImageNet, TierdeImageNet, and CUB, to verify the validity of the proposed feature extraction structure. Experimental results show that RireFeat achieves 0.64% and 1.10% higher accuracy than SetFeat in a 1-shot and 5-shot convolution-based backbone network on the MiniImageNet dataset, respectively. The ResNet12-based structure is 1.51% and 1.46% higher than SetFeat in 1-shot and 5-shot cases, respectively. CUB datasets provide gains 0.03% and 0.61% higher than SetFeat at 1-shot and 5-shot, respectively, in convolution-based backbone networks, demonstrating improvements of 0.66% and 0.75% over SetFeat in 1-shot and 5-shot scenarios, respectively, in the Resnet12-based structure. In TieredImageNet evaluation, the convolution-based backbone network architecture achieves 0.21% and 0.38% improvement over SetFeat under 1-shot and 5-shot conditions, respectively.
Conclusion
2
This paper proposes a rich representation feature extractor (RireFeat) to obtain a rich, comprehensive, and accurate feature representation for few-shot image classification. Different from traditional feature extractors and feature extraction forms, RireFeat increases the flow of information between feature extraction networks by paying attention to the changes in features before and after network transmission. RireFeat effectively reintegrates the category information lost during feature extraction into the feature representation. In addition, the concept of contrastive learning combined with deep Brownian distance covariance is introduced into the few-shot learning image classification to learn additional categorical representations for each image. Therefore, this extractor can capture highly nuanced differences between images from various categories, resulting in improved classification performance. In addition, the feature vector set is extracted from the image to provide strong support for the subsequent classification task. The proposed method achieves high classification accuracy on the MiniImageNet, TieredImageNet, and CUB datasets. Moreover, this paper verifies the universality of the proposed method with the current popular deep learning backbones, such as convolutional and residual backbones, highlighting its applicability to current state-of-the-art models.
小样本图像分类注意力机制多尺度度量特征表示对比学习深度布朗距离协方差
few-shot image classificationattention mechanismmulti-scale measurementfeature representationconstrastive learningdeep Brovonian distance convariance
Afrasiyabi A, Lalonde J F and Gagné C. 2021. Mixture-based feature space learning for few-shot image//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 9021-9031 [DOI: 10.1109/ICCV48922.2021.00891http://dx.doi.org/10.1109/ICCV48922.2021.00891]
Afrasiyabi A, Larochelle H, Lalonde J F and Gagné C. 2022. Matching feature sets for few-shot image classification//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 9004-9014 [DOI: 10.1109/CVPR52688.2022.00881http://dx.doi.org/10.1109/CVPR52688.2022.00881]
Allen K R, Shelhamer E, Shin H and Tenenbaum J B. 2019. Infinite mixture prototypes for few-shot learning//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 232-241
Bateni P, Goyal R, Masrani V, Wood F and Sigal L. 2020. Improved few-shot visual classification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 14481-14490 [DOI: 10.1109/CVPR42600.2020.01450http://dx.doi.org/10.1109/CVPR42600.2020.01450]
Cai Q, Pan Y W, Yao T, Yan C G and Mei T. 2018. Memory matching networks for one-shot image recognition//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4080-4088 [DOI: 10.1109/CVPR.2018.00429http://dx.doi.org/10.1109/CVPR.2018.00429]
Chen W Y, Liu Y C, Kira Z, Wang Y C F and Huang J B. 2019. A closer look at few-shot classification//Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: OpenReview.net
Chen Y B, Liu Z, Xu H J, Darrell T and Wang X L. 2021. Meta-baseline: exploring simple meta-learning for few-shot learning//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 9042-9051 [DOI: 10.1109/ICCV48922.2021.00893http://dx.doi.org/10.1109/ICCV48922.2021.00893]
Doersch C, Gupta A and Zisserman A. 2020. CrossTransformers: spatially-aware few-shot transfer//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: #1844
Dong Y Y, Song B B and Sun W F. 2023. Local feature fusion network-based few-shot image classification. Journal of Image and Graphics, 28(7): 2093-2104
董杨洋, 宋蓓蓓, 孙文方. 2023. 局部特征融合的小样本分类. 中国图象图形学报, 28(7): 2093-2104 [DOI: 10.11834/jig.220079http://dx.doi.org/10.11834/jig.220079]
Dvornik N, Mairal J and Schmid C. 2019. Diversity with cooperation: ensemble methods for few-shot classification//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 3722-3730 [DOI: 10.1109/ICCV.2019.00382http://dx.doi.org/10.1109/ICCV.2019.00382]
Fei N Y, Lu Z W, Xiang T and Huang S F. 2021. MELR: meta-learning via modeling episode-level relationships for few-shot learning//Proceedings of the 9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net: 1-20
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: PMLR: 1126-1135
Gidaris S and Komodakis N. 2018. Dynamic few-shot visual learning without forgetting//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4367-4375 [DOI: 10.1109/CVPR.2018.00459http://dx.doi.org/10.1109/CVPR.2018.00459]
He X J and Lin J F. 2022. Weakly-supervised object localization based fine-grained few-shot learning. Journal of Image and Graphics, 27(7): 2226-2239
贺小箭, 林金福. 2022. 融合弱监督目标定位的细粒度小样本学习. 中国图象图形学报, 27(7): 2226-2239 [DOI: 10.11834/jig.200849http://dx.doi.org/10.11834/jig.200849]
Hou R B, Chang H, Ma B P, Shan S G and Chen X L. 2019. Cross attention network for few-shot classification//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: #360
Lee K, Maji S, Ravichandran A and Soatto S. 2019. Meta-learning with differentiable convex optimization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 10649-10657 [DOI: 10.1109/CVPR.2019.01091http://dx.doi.org/10.1109/CVPR.2019.01091]
Li J N, Zhou P, Xiong C M and Hoi S C H. 2021. Prototypical contrastive learning of unsupervised representations//Proceedings of the 9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net: 1-16
Lin T Y, Dollr P, Girshick R, He K M, Hariharan B and Belongie S. 2017. Feature pyramid networks for object detection//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 936-944 [DOI: 10.1109/CVPR.2017.106http://dx.doi.org/10.1109/CVPR.2017.106]
Liu B, Cao Y, Lin Y T, Li Q, Zhang Z, Long M S and Hu H. 2020. Negative margin matters: understanding margin in few-shot classification//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 438-455 [DOI: 10.1007/978-3-030-58548-8_26http://dx.doi.org/10.1007/978-3-030-58548-8_26]
Munkhdalai T, Yuan X D, Mehri S and Trischler A. 2018. Rapid adaptation with conditionally shifted neurons//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 3661-3670
Nichol A, Achiam J and Schulman J. 2018. On first-order meta-learning algorithms [EB/OL]. [2023-11-06]. https://arxiv.org/pdf/1803.02999.pdfhttps://arxiv.org/pdf/1803.02999.pdf
Oh J, Yoo H, Kim C and Yun S Y. 2021. BOIL: towards representation change for few-shot learning//Proceedings of the 9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net
Oreshkin B N, Rodríguez P and Lacoste A. 2018. TADAM: task dependent adaptive metric for improved few-shot learning//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal, Canada: Curran Associates Inc.: 719-729
Qi G D, Yu H M, Lu Z H and Li S Z. 2021. Transductive few-shot classification on the oblique manifold//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: 8392-8402 [DOI: 10.1109/ICCV48922.2021.00830http://dx.doi.org/10.1109/ICCV48922.2021.00830]
Satorras V G and Estrach J B. 2018. Few-shot learning with graph neural networks//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: OpenReview.net: 1-13
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 4080-4090
Sung F, Yang Y X, Zhang L, Xiang T, Torr P H S and Hospedales T M. 2018. Learning to compare: relation network for few-shot learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1199-1208 [DOI: 10.1109/CVPR.2018.00131http://dx.doi.org/10.1109/CVPR.2018.00131]
Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K and Wierstra D. 2016. Matching networks for one shot learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 3637-3645
Wu Z R, Xiong Y J, Yu S X and Lin D H. 2018. Unsupervised feature learning via non-parametric instance discrimination//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3733-3742 [DOI: 10.1109/CVPR.2018.00393http://dx.doi.org/10.1109/CVPR.2018.00393]
Xie J T, Long F, Lv J M, Wang Q L and Li P H. 2022. Joint distribution matters: deep brownian distance covariance for few-shot classification//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 7962-7971 [DOI: 10.1109/CVPR52688.2022.00781http://dx.doi.org/10.1109/CVPR52688.2022.00781]
Xu C M, Fu Y W, Liu C, Wang C J, Li J L, Huang F Y, Zhang L and Xue X Y. 2021a. Learning dynamic alignment via meta-filter for few-shot learning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 5178-5187 [DOI: 10.1109/CVPR46437.2021.00514http://dx.doi.org/10.1109/CVPR46437.2021.00514]
Xu W J, Xu Y F, Wang H J and Tu Z W. 2021b. Attentional constellation nets for few-shot learning//Proceedings of the 9th International Conference on Learning Representations. Virtual Event, Austria: OpenReview.net: 1-16
Ye H J, Hu H X, Zhan D C and Sha F. 2020. Few-shot learning via embedding adaptation with set-to-set functions//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8805-8814 [DOI: 10.1109/CVPR42600.2020.00883http://dx.doi.org/10.1109/CVPR42600.2020.00883]
Zaheer M, Kottur S, Ravanbhakhsh S, Póczos B, Salakhutdinov R and Smola A J. 2017. Deep sets//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 3394-3404
Zhang C, Cai Y J, Lin G S and Shen C H. 2023. DeepEMD: differentiable earth mover’s distance for few-shot learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5): 5632-5648 [DOI: 10.1109/TPAMI.2022.3217373http://dx.doi.org/10.1109/TPAMI.2022.3217373]
Zhang C S, Chen J, Li Q L, Deng B Q, Wang J and Chen C G. 2023. Deep contrastive learning: a survey. Acta Automatica Sinica, 49(1): 15-39
张重生, 陈杰, 李岐龙, 邓斌权, 王杰, 陈承功. 2023. 深度对比学习综述. 自动化学报, 49(1): 15-39 [DOI: 10.16383/j.aas.c220421http://dx.doi.org/10.16383/j.aas.c220421]
Zhao K L, Jin X L and Wang Y Z. 2021. Survey on few-shot learning. Journal of Software, 32(2): 349-369
赵凯琳, 靳小龙, 王元卓. 2021. 小样本学习研究综述. 软件学报, 32(2): 349-369 [DOI: 10.13328/j.cnki.jos.006138http://dx.doi.org/10.13328/j.cnki.jos.006138]
相关作者
相关机构