Co-history:协同学习中考虑历史信息的标签噪声鲁棒学习方法
Co-history: learning with noisy labels by co-teaching with history losses
- 2024年29卷第12期 页码:3684-3698
纸质出版日期: 2024-12-16
DOI: 10.11834/jig.230541
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-12-16 ,
移动端阅览
董永峰, 李佳伟, 王振, 贾文玉. 2024. Co-history:协同学习中考虑历史信息的标签噪声鲁棒学习方法. 中国图象图形学报, 29(12):3684-3698
Dong Yongfeng, Li Jiawei, Wang Zhen, Jia Wenyu. 2024. Co-history: learning with noisy labels by co-teaching with history losses. Journal of Image and Graphics, 29(12):3684-3698
目的
2
深度神经网络在计算机视觉分类任务上表现出优秀的性能,然而,在标签噪声环境下,深度学习模型面临着严峻的考验。基于协同学习(co-teaching)的学习算法能够有效缓解神经网络对噪声标签数据的学习问题,但仍然存在许多不足之处。为此,提出了一种协同学习中考虑历史信息的标签噪声鲁棒学习方法(Co-history)。
方法
2
首先,针对在噪声标签环境下使用交叉熵损失函数(cross entropy,CE)存在的过拟合问题,通过分析样本损失的历史规律,提出了修正损失函数,在模型训练时减弱CE损失带来的过拟合带来的影响。其次,针对co-teaching算法中两个网络存在过早收敛的问题,提出差异损失函数,在训练过程中保持两个网络的差异性。最后,遵循小损失选择策略,通过结合样本历史损失,提出了新的样本选择方法,可以更加精准地选择干净样本。
结果
2
在4个模拟噪声数据集F-MNIST(Fashion-mixed National Institute of Standards and Technology)、SVHN(street view house number)、CIFAR-10(Canadian Institute for Advanced Research-10)和CIFAR-100和一个真实数据集Clothing1M上进行对比实验。其中,在F-MNIST、SVHN、CIFAR-10、CIFAR-100,对称噪声(symmetric)40%噪声率下,对比co-teaching算法,本文方法分别提高了3.52%、4.77%、6.16%和6.96%;在真实数据集Clothing1M下,对比co-teaching算法,本文方法的最佳准确率和最后准确率分别提高了0.94%和1.2%。
结论
2
本文提出的协同学习下考虑历史损失的带噪声标签鲁棒分类算法,经过大量实验论证,可以有效降低噪声标签带来的影响,提高模型分类准确率。
Objective
2
Deep neural networks (DNNs) have been successfully applied in many fields, especially in computer vision, which cannot be achieved without large-scale labeled datasets. However, collecting large-scale datasets with accurate labels is difficult in practice, especially in some professional fields. The labeling of these datasets requires the involvement of relevant experts, thus increasing manpower and financial resources. To cut costs, researchers have started using datasets built by crowdsourcing annotations, search engine queries, and web crawling, among others. However, these datasets inevitably contain noisy labels that seriously affect the generalization of DNNs because DNNs memorize these noise labels during training. Learning algorithms based on co-teaching methods, including Co-teaching+, JoCoR, and CoDis, can effectively alleviate the learning problem of neural networks on noisy label data. Scholars have put forward different opinions regarding the use of two networks to solve noisy labels. However, in a noisy label environment, the deep learning model based on CE loss is very sensitive to the noisy label, thus making the model easily fit the noisy label sample and unable to learn the real pattern of the data. With the progress of training, Co-teaching causes the parameters of the two networks to gradually become consistent and prematurely converge to the same network, thus stopping the learning process. As the iteration progresses, the network inevitably remembers some of the noisy label samples and thus failing to distinguish the noisy from the clean samples accurately based on the cross entropy (CE) loss value. In this case, relying solely on CE loss as a small loss selection strategy is not reliable. To solve these problems, this paper proposes learning with noisy labels by co-teaching with history losses (Co-history) that considers historical information in collaborative learning.
Method
2
First, to solve the overfitting problem of cross entropy loss (CE) in a noisy label environment, a correction loss is proposed by analyzing the history of sample loss. The revised loss function adjusts the weight of the CE loss in the current iteration in order for the CE loss of the sample to remain stable in the historical iteration as a whole, hence conforming to the law that the classifier should be maintained after separating the noisy from the clean samples so as to reduce the influence of overfitting caused by CE loss. Second, the difference loss is proposed to address the problem of premature convergence of two networks in the co-teaching algorithm. Inspired by contrast loss, the difference loss makes the two networks maintain a certain distance from the feature representation of the same sample so as to maintain the difference between these networks in the training process and to avoid their degradation into a single network. Given the differences in the network parameters, various decision boundaries are generated, and different types of errors are filtered. Therefore, maintaining such difference can benefit collaborative learning. Finally, due to the existence of overfitting, those samples with noisy labels tend to have larger loss fluctuations than those samples with clean labels. By combining the historical loss information of these samples and following the small loss selection strategy, a new sample selection method is proposed to select clean samples accurately. Specifically, those samples with low classification losses and low fluctuations in historical losses are selected as clean samples for training.
Result
2
Several experiments are conducted to demonstrate the effectiveness of the Co-history algorithm, including comparison experiments on four standard datasets (F-MNIST, SVHN, CIFAR-10, and CIFAR-100) and one real dataset (Clothing1M). Four categories of artificially simulated noise are added to the standard dataset, including symmetric, asymmetric, pairflip, and tridiagonal noise types, with 20% and 40% noise rates for each noise type. In the real dataset, the labels are generated by the text around the image, which contains the noise label itself, thus generating no additional label noise. At the symmetric noise type with 20% noise rate, the co-history algorithm demonstrates 2.05%, 2.19%, 3.06%, and 2.58% improvements over the co-teaching algorithm in the F-MNIST, SVHN, CIFAR-10, and CIFAR-100 datasets, respectively. With 40% noise rate, the corresponding improvements are 3.52%, 4.77%, 6.16%, and 6.96%. In the real Clothing1M dataset, the best and lowest accuracies of co-history have improved by 0.94% and 1.2%, respectively, compared with the co-teaching algorithm. The effectiveness of the proposed loss is proven by ablation experiments.
Conclusion
2
A correction loss is proposed in this paper to address the overfitting problem of CE loss training and the historical law of sample loss, and a difference loss function is introduced to solve the premature convergence of two networks in Co-teaching. In view of the traditional small-loss sample selection strategy, the historical law of sample loss is fully considered in this paper, and a highly accurate sample selection strategy is developed. The proposed Co-history algorithm demonstrates its superiority over the existing co-teaching strategies in a large number of experiments. This algorithm also shows strong robustness in datasets with noisy labels and is particularly suitable for noisy label scenarios. The various improvements in this algorithm are also clearly demonstrated in ablation experiments. Given that this algorithm needs to analyze the historical loss information of each sample, the historical loss value of each sample should be saved. Increasing the number of training samples would occupy more memory space, thus increasing computing and storage costs. In addition, with a large number of sample categories, the performance of the proposed algorithm becomes suboptimal under some noisy environments (e.g., asymmetric noise type with 40% noise rate and the CIFAR-100 dataset with 20% noise rate). Future work will focus on the development of high-performance solutions under the premise of guaranteed accuracy and excellent robust classification algorithms for learning with noisy labels.
深度神经网络(DNN)分类噪声标签协同学习历史损失
deep neural network(DNN)classificationnoisy labelsco-teachinghistorical loss
Arpit D, Jastrzębski S, Ballas N, Krueger D, Bengio E, Kanwal M S, Maharaj T, Fischer A, Courville A, Bengio Y and Lacoste-Julien S. 2017. A closer look at memorization in deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 233-242
Blum A, Kalai A and Wasserman H. 2003. Noise-tolerant learning, the parity problem, and the statistical query model. Journal of the ACM (JACM), 50(4): 506-519 [DOI: 10.1145/792538.792543http://dx.doi.org/10.1145/792538.792543]
Diaz F. 2009. Integration of news content into web results//Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. Barcelona, Spain: ACM: 182-191 [DOI: 10.1145/1498759.1498825http://dx.doi.org/10.1145/1498759.1498825]
Ding Y F, Wang L Q, Fan D L and Gong B Q. 2018 A semi-supervised two-stage approach to learning from noisy labels//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe, USA: IEEE: 1215-1224 [DOI: 10.1109/wacv.2018.00138http://dx.doi.org/10.1109/wacv.2018.00138]
Ghosh A, Kumar H and Sastry P S. 2017. Robust loss functions under label noise for deep neural networks//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 1919-1925
Ghosh A, Manwani N and Sastry P S. 2015. Making risk minimization tolerant to label noise. Neurocomputing, 160: 93-107 [DOI: 10.1016/j.neucom.2014.09.081http://dx.doi.org/10.1016/j.neucom.2014.09.081]
Han B, Yao Q M, Yu X R, Niu G, Xu M, Hu W H, Tsang I W and Sugiyama M. 2018. Co-teaching: robust training of deep neural networks with extremely noisy labels//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 8536-8546
Huang Z, Zhang J and Shan H. 2023. Twin contrastive learning with noisy labels//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver, Canada: IEEE: 11661-11670[DOI: 10.1109/cvpr52729.2023.01122http://dx.doi.org/10.1109/cvpr52729.2023.01122]
Jiang L, Zhou Z Y, Leung T, Li L J and Li F F. 2018. MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 2304-2313
Kim Y, Yun J, Shon H and Kim J. 2021. Joint negative and positive learning for noisy labels//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 9437-9446 [DOI: 10.1109/cvpr46437.2021.00932http://dx.doi.org/10.1109/cvpr46437.2021.00932]
Krizhevsky A. 2009. Learning multiple layers of features from tiny images [EB/OL]. [2023-07-02]. http://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdfhttp://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf
Krizhevsky A, Sutskever I and Hinton G EH. 2017. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]
Li J N, Socher R and Hoi S C H. 2020. DivideMix: learning with noisy labels as semi-supervised learning//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net: 1-14
Li Y X, Shen J R and Xu Q. 2023. A summary of image recognition-relevant multi-layer spiking neural networks learning algorithms. Journal of Image and Graphics, 28(2): 385-400
李雅馨, 申江荣, 徐齐. 2023. 面向图像识别的多层脉冲神经网络学习算法综述. 中国图象图形学报, 28(2): 385-400[DOI: 10.11834/jig.220452http://dx.doi.org/10.11834/jig.220452]
Liu J R, Jiang D G, Yang Y K and Li R R. 2022. Agreement or disagreement in noise-tolerant mutual learning?//Proceedings of the 26th International Conference on Pattern Recognition. Montréal, Canada: IEEE: 4801-4807 [DOI: 10.1109/icpr56361.2022.9956595http://dx.doi.org/10.1109/icpr56361.2022.9956595]
Liu T L and Tao D C. 2016. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(3): 447-461 [DOI: 10.1109/tpami.2015.2456899http://dx.doi.org/10.1109/tpami.2015.2456899]
Lukasik M, Bhojanapalli S, Menon A K and Kumar S. 2020. Does label smoothing mitigate label noise?//Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: PMLR: 6448-6458
Ma X J, Huang H X, Wang Y S, Sarah Erfani S R and Bailey J. 2020. Normalized loss functions for deep learning with noisy labels//Proceedings of the 37th International Conference on Machine Learning. Vienna, Austria: JMLR.org: 6543-6553
Mahajan D, Girshick R, Ramanathan V, He K M, Paluri M, Li Y X, Bharambe A and van der Maaten L. 2018. Exploring the limits of weakly supervised pretraining//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 185-201 [DOI: 10.1007/978-3-030-01216-8_12http://dx.doi.org/10.1007/978-3-030-01216-8_12]
Netzer Y, Wang T, Coates A, Bissacco A, Wu B and Ng A Y. 2011. Reading digits in natural images with unsupervised feature learning//Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Granada, Spain: NIPS: 462-471
Patrini G, Rozza A, Menon A K, Nock R and Qu L Z. 2017. Making deep neural networks robust to label noise: a loss correction approach//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2233-2241 [DOI: 10.1109/cvpr.2017.240http://dx.doi.org/10.1109/cvpr.2017.240]
Pham H, Dai Z H, Xie Q Z and Le Q V. 2021. Meta pseudo labels//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 11552-11563 [DOI: 10.1109/CVPR46437.2021.01139http://dx.doi.org/10.1109/CVPR46437.2021.01139]
Song H, Kim M, Park D, Shin Y and Lee J G. 2023. Learning from noisy labels with deep neural networks: a survey. IEEE Transactions on Neural Networks and Learning Systems, 34(11): 8135-8153 [DOI: 10.1109/tnnls.2022.3152527http://dx.doi.org/10.1109/tnnls.2022.3152527]
Wang Y S, Ma X J, Chen Z Y, Luo Y, Yi J F and Bailey J. 2019. Symmetric cross entropy for robust learning with noisy labels//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 322-330 [DOI: 10.1109/iccv.2019.00041http://dx.doi.org/10.1109/iccv.2019.00041]
Wei H X, Feng L, Chen X Y and An B. 2020. Combating noisy labels by agreement: a joint training method with co-regularization//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13723-13732 [DOI: 10.1109/cvpr42600.2020.01374http://dx.doi.org/10.1109/cvpr42600.2020.01374]
Wei Q, Sun H L, Lu X K and Yin Y L. 2022. Self-filtering: a noise-aware sample selection for label noise with confidence penalization//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 516-532 [DOI: 10.1007/978-3-031-20056-4_30http://dx.doi.org/10.1007/978-3-031-20056-4_30]
Xia X B, Han B, Zhan Y B, Yu J, Gong M M, Gong C and Liu T L. 2023. Combating noisy labels with sample selection by mining high-discrepancy examples//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris, France: IEEE: 1833-1843 [DOI: 10.1109/iccv51070.2023.00176http://dx.doi.org/10.1109/iccv51070.2023.00176]
Xia X B, Liu T L, Han B, Gong M M, Yu J, Niu G and Sugiyama M. 2022. Sample selection with uncertainty of losses for learning with noisy labels//Proceedings of the 10th International Conference on Learning Representations. Virtual Place: IEEE: 1-23
Xiao H, Rasul K and Vollgraf R. 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms [EB/OL]. [2023-07-02]. https://arxiv.org/pdf/1708.07747.pdfhttps://arxiv.org/pdf/1708.07747.pdf
Xiao T, Xia T, Yang Y, Huang C and Wang X G. 2015. Learning from massive noisy labeled data for image classification//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 2691-2699 [DOI: 10.1109/cvpr.2015.7298885http://dx.doi.org/10.1109/cvpr.2015.7298885]
Yan Y, Rosales R, Fung G, Subramanian R and Dy J. 2014. Learning from multiple annotators with varying expertise. Machine Learning, 95(3): 291-327 [DOI: 10.1007/s10994-013-5412-1http://dx.doi.org/10.1007/s10994-013-5412-1]
Yi K and Wu J X. 2019. Probabilistic end-to-end noise correction for learning with noisy labels//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7010-7018 [DOI: 10.1109/cvpr.2019.00718http://dx.doi.org/10.1109/cvpr.2019.00718]
Yu X R, Han B, Yao J C, Niu G, Tsang I and Sugiyama M. 2019. How does disagreement help generalization against label corruption?//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 7164-7173
Yu X Y, Liu T L, Gong M M, Batmanghelich K and Tao D C. 2018. An efficient and provable approach for mixture proportion estimation using linear independence assumption//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 4480-4489 [DOI: 10.1109/cvpr.2018.00471http://dx.doi.org/10.1109/cvpr.2018.00471]
Zhang K, Feng X H, Guo Y R, Su Y K, Zhao K, Zhao Z B, Ma Z Y and Ding Q L. 2021. Overview of deep convolutional neural networks for image classification. Journal of Image and Graphics, 26(10): 2305-2325
张珂, 冯晓晗, 郭玉荣, 苏昱坤, 赵凯, 赵振兵, 马占宇, 丁巧林. 2021. 图像分类的深度卷积神经网络模型综述. 中国图象图形学报, 26(10): 2305-2325[DOI: 10.11834/jig.200302http://dx.doi.org/10.11834/jig.200302]
Zhang Y, Niu G and Sugiyama M. 2021. Learning noise transition matrix from only noisy labels via total variation regularization//Proceedings of the 38th International Conference on Machine Learning. [s.l.]: PMLR: 12501-12512
Zhang Z L and Sabuncu M R. 2018. Generalized cross entropy loss for training deep neural networks with noisy labels//Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada: NeurIPS: 8792-8802
相关作者
相关机构