面向元余弦损失的少样本图像分类

陶鹏; 冯林; 杜彦东; 龚勋; 王俊

doi:10.11834/jig.230127

图像理解和计算机视觉 | 浏览量 : 0 下载量: 258 CSCD: 2

PDF
导出
分享
收藏
专辑

面向元余弦损失的少样本图像分类
Meta-cosine loss for few-shot image classification
2024年29卷第2期页码：506-519
收稿日期：2023-03-24，

修回日期：2023-07-10，

纸质出版日期：2024-02-16
DOI： 10.11834/jig.230127
稿件说明：

移动端阅览

陶鹏，冯林，杜彦东，龚勋，王俊. 2024. 面向元余弦损失的少样本图像分类. 中国图象图形学报， 29(02):0506-0519 DOI： 10.11834/jig.230127.

Tao Peng， Feng Lin， Du Yandong， Gong Xun， Wang Jun. 2024. Meta-cosine loss for few-shot image classification. Journal of Image and Graphics， 29(02):0506-0519 DOI： 10.11834/jig.230127.

摘要

目的

度量学习是少样本学习中一种简单且有效的方法，学习一个丰富、具有判别性和泛化性强的嵌入空间是度量学习方法实现优秀分类效果的关键。本文从样本自身的特征以及特征在嵌入空间中的分布出发，结合全局与局部数据增强实现了一种元余弦损失的少样本图像分类方法（a meta-cosine loss for few-shot image classification，AMCL-FSIC）。

方法

首先，从数据自身特征出发，将全局与局部的数据增广方法结合起来，利于局部信息提供更具区别性和迁移性的信息，使训练模型更多关注图像的前景信息。同时，利用注意力机制结合全局与局部特征，以得到更丰富更具判别性的特征。其次，从样本特征在嵌入空间中的分布出发，提出一种元余弦损失（meta-cosine loss，MCL）函数，优化少样本图像分类模型。使用样本与类原型间相似性的差调整不同类的原型，扩大类间距，使模型测试新任务时类间距更加明显，提升模型的泛化能力。

结果

分别在5个少样本经典数据集上进行了实验对比，在FC100（Few-shot Cifar100）和CUB（Caltech-UCSD Birds-200-2011）数据集上，本文方法均达到了目前最优分类效果；在MiniImageNet、TieredImageNet和Cifar100数据集上与对比模型的结果相当。同时，在MiniImageNet，CUB和 Cifar100数据集上进行对比实验以验证MCL的有效性，结果证明提出的MCL提升了余弦分类器的分类效果。

结论

本文方法能充分提取少样本图像分类任务中的图像特征，有效提升度量学习在少样本图像分类中的准确率。

Abstract

Objective

Few-shot learning （FSL） is a popular and difficult problem in computer vision. It aims to achieve effective classification with a few labeled samples. Recent few-shot learning methods can be divided into three major categories： metric-， transfer-， and gradient-based methods. Among them， metric-based learning methods have received considerable attention because of their simplicity and excellent performance in few-shot learning problems. In particular， metric-based learning methods consist of a feature extractor based on a convolutional neural network （CNN） and a classifier based on spatial distance. By mapping the samples into the embedding space， a simple metric function is used to calculate the similarity between the sample and the class prototype， quickly identifying the novel class sample. The metric function is used for classification， and it bypasses the optimization problem in the few-shot setting when using network learning classifiers. Therefore， a richer， more discriminative， and better generalization embedding space is the key for metric-based learning methods. From the perspective of the feature and its embedding space， and by combining the global and local features of a sample， we propose a meta-cosine loss for few-shot image classification method， called AMCL-FSIC， to improve the accuracy of metric-based learning methods.

Method

On the one hand， our primary objective is to obtain suitable features. Image information is composed of foreground and background images. The foreground image is beneficial for few-shot classification， whereas the background image is detrimental. If we can force the model to focus only on the foreground during training and evaluation and disregard the background， then this scenario is helpful for image classification. However， it is not easy to achieve. In fact， we need prior knowledge of the prospective object. As stated by previous researchers， images are roughly divided into global and local features， which are randomly cropped portions of each image. Local features contain cross-category discriminatory and transferable information， which is of considerable significance for few-shot image classification. First， we combine global and local data enhancement strategies. In particular， the local information of an image allows the model to give more attention to the uniqueness and transfer characteristics of the sample， minimizing the effect of background information. Then， the introduction of the self-attention mechanisms helps combine global and local features， gaining richer and more distinguished features. On the other hand， from the feature distribution in the embedded space， we meta-train a cosine classifier and minimize loss by calculating the strings between the sample and the prototypes. In the embedded space， features with the same category are gathered together， while different categories of features are far from one another. However， previous residue classifiers only give attention to the same class during the training period and do not completely stretch different types of samples. The direct consequence of this situation is that the generalization capacity of the model decreases when facing new test tasks with similar categories. We propose the meta-cosine loss （MCL） on the basis of the cosine classifier. During meta-training， the difference of the cosine similarity between the sample and the class prototype is used to adjust the class prototype in accordance with the parallelogram principle. MCL places the model as far away as possible from the feature clusters of different classes in the task， ensuring that the classes are more separable when the model faces a new test task and improving the generalization ability of the model.

Result

We conduct extensive experiments to verify the model’s effectiveness. Experiments are performed on five classical few-shot datasets， as follows： MiniImageNet， TieredImageNet， Cifar100， Few-shot Cifar 100（FC100）， and Caltech-UCSD Birds-200-2011（CUB）. The input images are resized to 84 × 84 pixels for training， the momentum parameter is set to 0.95， the learning rate is set to 0.000 2， and the weight decay is 0.000 1. The model learning procedure is accelerated using a NVIDIA GeForce RTX 3090 GPU device. To ensure the fairness of comparison， we adopt the 5-way 1-shot and 5-way 5-shot settings during the training and testing phases. The experimental results show that the image classification accuracy of MiniImageNet， TieredImageNet， Cifar100， FC100， and CUB datasets is 68.92/84.45， 72.41/87.36， 76.79/88.52， 50.86/67.19， and 81.12/91.43， respectively， on the 5-way 1-shot and 5-way 5-shot settings. Compared with the latest few-shot image classification methods， our model exhibits more advantages. Simultaneously， we perform comparative experiments on the MiniImageNet， CUB， and Cifar100 datasets to verify the effectiveness of MCL. From the comparative experimental results， the introduction of the MCL classifier can improve image classification accuracy by nearly 4% and 2% under the 1-shot and 5-shot settings， respectively. MCL has considerably improved the classification ability of the cosine classifier.

Conclusion

Our work proposes MCL and combines global and local data augmentation methods to improve the generalization ability of the model. This approach is suitable for any metric-based method.

关键词

Keywords

references

Afrasiyabi A ， Larochelle H ， Lalonde J F and Gagne C . 2022 . Matching feature sets for few-shot image classification // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， USA ： IEEE： 9004 - 9014 ［ DOI： 10.1109/CVPR52688.2022.00881 http://dx.doi.org/10.1109/CVPR52688.2022.00881 ］

Baik S ， Choi J ， Kim H ， Cho D ， Min J and Lee K M . 2021 . Meta-learning with task-adaptive loss function for few-shot learning // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 9445 - 9454 ［ DOI： 10.1109/ICCV48922.2021.00933 http://dx.doi.org/10.1109/ICCV48922.2021.00933 ］

Baik S ， Choi M ， Choi J ， Kim H and Lee K M . 2020b . Meta-learning with adaptive hyperparameters // Proceedings of the 34th International Conference on Neural Information Processing Systems . Vancouver， Canada ： Curran Associates Inc.： 20755 - 20765

Baik S ， Hong S and Lee K M . 2020a . Learning to forget for meta-learning // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 2376 - 2384 ［ DOI： 10.1109/CVPR42600.2020.00245 http://dx.doi.org/10.1109/CVPR42600.2020.00245 ］

Berthelot D ， Carlini N ， Goodfellow I ， Oliver A ， Papernot N and Raffel C . 2019 . MixMatch： a holistic approach to semi-supervised learning // Proceedings of the 33rd International Conference on Neural Information Processing Systems . Vancouver， Canada ： Curran Associates Inc.： 5049 - 5059

Bertinetto L ， Henriques J F ， Torr P H S and Vedaldi A . 2019 . Meta-learning with differentiable closed-form solvers ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/1805.08136.pdf https://arxiv.org/pdf/1805.08136.pdf

Chen W Y ， Liu Y C ， Kira Z ， Wang Y C F and Huang J B . 2020 . A closer look at few-shot classification ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/1904.04232.pdf https://arxiv.org/pdf/1904.04232.pdf

Fei N Y ， Lu Z W ， Xiang T and Huang S F . 2021 . MELR： meta-learning via modeling episode-level relationships for few-shot learning // Proceedings of 2022 International Conference on Learning Representations . Addis Ababa， Ethiopia ：［s.n.］： 1 - 20

Finn C ， Abbeel P and Levine S . 2017 . Model-agnostic meta-learning for fast adaptation of deep networks // Proceedings of the 34th International Conference on Machine Learning . Sydney， Australia ： JMLR.org： 1126 - 1135

Ge Y Z ， Liu H ， Wang Y ， Xu B L ， Zhou Q and Shen F R . 2022 . Survey on deep learning image recognition in dilemma of small samples . Journal of Software ， 33 （ 1 ）： 193 - 210

葛轶洲，刘恒，王言，徐百乐，周青，申富饶 . 2022 . 小样本困境下的深度学习图像识别综述 . 软件学报， 33 （ 1 ）： 193 - 210 ［ DOI： 10.13328/j.cnki.jos.006342 http://dx.doi.org/10.13328/j.cnki.jos.006342 ］

Gidaris S ， Bursuc A ， Komodakis N ， Perez P P and Cord M . 2019 . Boosting few-shot visual learning with self-supervision // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 8058 - 8067 ［ DOI： 10.1109/iccv.2019.00815 http://dx.doi.org/10.1109/iccv.2019.00815 ］

He K M ， Zhang X Y ， Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， USA ： IEEE： 770 - 778 ［ DOI： 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ］

Li A X ， Huang W R ， Lan X ， Feng J S ， Li Z G and Wang L W . 2020 . Boosting few-shot learning with adaptive margin loss // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 12573 - 12581 ［ DOI： 10.1109/CVPR42600.2020.01259 http://dx.doi.org/10.1109/CVPR42600.2020.01259 ］

Liu C ， Fu Y W ， Xu C M ， Yang S Q ， Li J L ， Wang C J and Zhang L . 2021 . Learning a few-shot embedding model with contrastive learning . Proceedings of the AAAI Conference on Artificial Intelligence ， 35 （ 10 ）： 8635 - 8643 ［ DOI： 10.1609/aaai.v35i10.17047 http://dx.doi.org/10.1609/aaai.v35i10.17047 ］

Liu Y B ， Lee J ， Park M ， Kim S ， Yang E ， Hwang S J and Yang Y . 2019 . Learning to propagate labels： transductive propagation network for few-shot learning ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/1805.10002.pdf https://arxiv.org/pdf/1805.10002.pdf

Lyu J and Wu R Y . 2023 . Multi-layer adaptive aggregation self-supervised few-shot learning image classification . Journal of Image and Graphics ， 28 （ 4 ）： 1056 - 1068

吕佳，巫若愚 . 2023 . 多层自适应聚合的自监督小样本图像分类 . 中国图象图形学报， 28 （ 4 ）： 1056 - 1068 ［ DOI： 10.11834/jig.211182 http://dx.doi.org/10.11834/jig.211182 ］

Luo X ， Wei L H ， Wen L J ， Yang J R ， Xie L X ， Xu Z L and Tian Q . 2022 . Rectifying the shortcut learning of background for few-shot learning ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/2107.07746.pdf https://arxiv.org/pdf/2107.07746.pdf

Rabiner L R . 1989 . A tutorial on hidden Markov models and selected applications in speech recognition . Proceedings of the IEEE ， 77 （ 2 ）： 257 - 286 ［ DOI： 10.1109/5.18626 http://dx.doi.org/10.1109/5.18626 ］

Rajasegaran J ， Khan S ， Hayat M ， Khan F S and Shah M . 2020 . Self-supervised knowledge distillation for few-shot learning ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/2006.09785v2.pdf https://arxiv.org/pdf/2006.09785v2.pdf

Ren M Y ， Triantafillou E ， Ravi S ， Snell J ， Swersky K ， Tenenbaum J B ， Larochelle H and Zemel R S . 2018 . Meta-learning for semi-supervised few-shot classification ［EB/OL］. ［ 2023-03-09 ］. https://arxiv.org/pdf/1803.00676.pdf https://arxiv.org/pdf/1803.00676.pdf

Rizve M N ， Khan S ， Khan F S and Shah M . 2021 . Exploring complementary strengths of invariant and equivariant representations for few-shot learning // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， USA ： IEEE： 10831 - 10841 ［ DOI： 10.1109/CVPR46437.2021.01069 http://dx.doi.org/10.1109/CVPR46437.2021.01069 ］

Russakovsky O ， Deng J ， Su H ， Krause J ， Satheesh S ， Ma S A ， Huang Z H ， Karpathy A ， Khosla A ， Bernstein M ， Berg A C and Li F F . 2015 . ImageNet large scale visual recognition challenge . International Journal of Computer Vision ， 115 （ 3 ）： 211 - 252 ［ DOI： 10.1007/s11263-015-0816-y http://dx.doi.org/10.1007/s11263-015-0816-y ］

Snell J ， Swersky K and Zemel R . 2017 . Prototypical networks for few-shot learning // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 4080 - 4090

Sung F ， Yang Y X ， Zhang L ， Xiang T ， Torr P H S and Hospedales T M . 2018 . Learning to compare： relation network for few-shot learning // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern recognition . Salt Lake City， USA ： IEEE： 1199 - 1208 ［ DOI： 10.1109/CVPR.2018.00131 http://dx.doi.org/10.1109/CVPR.2018.00131 ］

Tian Y L ， Wang Y ， Krishnan D ， Tenenbaum J B and Isola P . 2020 . Rethinking few-shot image classification： a good embedding is all you need？ // Proceedings of the 16th European Conference on Computer Vision . Glasgow， UK ： Springer： 266 - 282 ［ DOI： 10.1007/978-3-030-58568-6_16 http://dx.doi.org/10.1007/978-3-030-58568-6_16 ］

Vaswani A ， Shazeer N ， Parmar N ， Uszkoreit J ， Jones L ， Gomez A N ， Kaiser Ł and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 31st International Conference on Neural Information Processing Systems . Long Beach， USA ： Curran Associates Inc.： 6000 - 6010

Vinyals O ， Blundell C ， Lillicrap T ， Kavukcuoglu K and Wierstra D . 2016 . Matching networks for one shot learning // Proceedings of the 30th International Conference on Neural Information Processing Systems . Barcelona， Spain ： Curran Associates Inc.： 3637 - 3645

Wah C ， Branson S ， Welinder P ， Perona P and Belongie S J . 2011 . The Caltech-UCSD Birds-200-2011 Dataset . California Institute of Technology

Wang Y K ， Xu C M ， Liu C ， Zhang L and Fu Y W . 2020 . Instance credibility inference for few-shot learning // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 12833 - 12842 ［ DOI： 10.1109/CVPR42600.2020.01285 http://dx.doi.org/10.1109/CVPR42600.2020.01285 ］

Xing C ， Rostamzadeh N ， Oreshkin B N ， and Pinheiro P H O . 2019 . Adaptive cross-modal few-shot learning // Proceedings of the 33rd International Conference on Neural Information Processing Systems . Vancouver， Canada ： NIPS： 4847 - 4857

Xu P B ， Sang J T and Lu D Y . 2021 . Few shot image recognition based on class semantic similarity supervision . Journal of Image and Graphics ， 26 （ 7 ）： 1594 - 1603

徐鹏帮，桑基韬，路冬媛 . 2021 . 类别语义相似性监督的小样本图像识别 . 中国图象图形学报， 26 （ 7 ）： 1594 - 1603 ［ DOI： 10.11834/jig.200504 http://dx.doi.org/10.11834/jig.200504 ］

Ye H J ， Hu H X ， Zhan D C and Sha F . 2020 . Few-shot learning via embedding adaptation with set-to-set functions // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 8805 - 8814 ［ DOI： 10.1109/CVPR42600.2020.00883 http://dx.doi.org/10.1109/CVPR42600.2020.00883 ］

Yu Y ， Feng L ， Wang G G and Xu Q F . 2019 . A few-shot learning model based on semi-supervised with pseudo label . Acta Electronica Sinica ， 47 （ 11 ）： 2284 - 2291

余游，冯林，王格格，徐其凤 . 2019 . 一种基于伪标签的半监督少样本学习模型 . 电子学报， 47 （ 11 ）： 2284 - 2291 ［ DOI： 10.3969/j.issn.0372-2112.2019.11.007 http://dx.doi.org/10.3969/j.issn.0372-2112.2019.11.007 ］

Yu Z J ， Chen L ， Cheng Z W and Luo J B . 2020 . TransMatch： a transfer-learning scheme for semi-supervised few-shot learning // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 12853 - 12861 ［ DOI： 10.1109/CVPR42600.2020.01287 http://dx.doi.org/10.1109/CVPR42600.2020.01287 ］

Zhang C ， Cai Y J ， Lin G S and Shen C H . 2020 . DeepEMD： few-shot image classification with differentiable earth mover’s distance and structured classifiers // Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， USA ： IEEE： 12200 - 12210 ［ DOI： 10.1109/CVPR42600.2020.01222 http://dx.doi.org/10.1109/CVPR42600.2020.01222 ］

Zhang M L ， Zhang J H ， Lu Z W ， Xiang T ， Ding M Y and Huang S F . 2021 . IEPT： instance-level and episode-level pretext tasks for few-shot learning // Proceedings of 2021 International Conference on Learning Representations . Addis Ababa， Ethiopia ：［s.n.］： 1 - 16

Zhang R ， Yang Y X ， Li Y ， Wang J B ， Miao Z ， Li H and Wang Z Q . 2022 . Self-supervised learning based few-shot remote sensing scene image classification . Journal of image and Graphics ， 27 （ 11 ）： 3371 - 3381

张睿，杨义鑫，李阳，王家宝，苗壮，李航，王梓祺 . 2022 . 自监督学习下小样本遥感图像场景分类 . 中国图象图形学报， 27 （ 11 ）： 3371 - 3381 ［ DOI： 10.11834/jig.210486 http://dx.doi.org/10.11834/jig.210486 ］

Zhou Z Q ， Qiu X ， Xie J T ， Wu J N and Zhang C . 2021 . Binocular mutual learning for improving few-shot classification // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal， Canada ： IEEE： 8382 - 8391 ［ DOI： 10.1109/ICCV48922.2021.00829 http://dx.doi.org/10.1109/ICCV48922.2021.00829 ］

文章被引用时，请邮件提醒。

提交

双分支注意和特征交互的小样本细粒度学习

不确定性域感知网络在少样本跨域图像分类中的研究

小样本SAR图像分类方法综述

元迁移学习在少样本跨域图像分类中的研究