BSGAN-GP:类别均衡驱动的半监督图像识别模型
BSGAN-GP: a semi-supervised image recognition model driven by class balancing
- 2025年30卷第1期 页码:95-109
纸质出版日期: 2025-01-16
DOI: 10.11834/jig.230881
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-01-16 ,
移动端阅览
胡静, 张汝敏, 连炳全. BSGAN-GP:类别均衡驱动的半监督图像识别模型[J]. 中国图象图形学报, 2025,30(1):95-109.
HU JING, ZHANG RUMIN, LIAN BINGQUAN. BSGAN-GP: a semi-supervised image recognition model driven by class balancing. [J]. Journal of image and graphics, 2025, 30(1): 95-109.
目的
2
已有的深度学习图像识别模型严重依赖于大量专业人员手工标记的数据,这些专业图像标签信息难以获取,人工标记代价昂贵。实际场景中的数据集大多具有不平衡性,正负样本偏差严重导致模型在拟合时常偏向多数类,对少数类的识别精度不足。这严重阻碍了深度学习在实际图像识别中的广泛应用。
方法
2
结合半监督生成式对抗网络(semi-supervised generative adversarial netowrk)提出了一种新的平衡模型架构BSGAN-GP(balancing semi-supervised generative adversarial network-gradient penalty),使得半监督生成式对抗网络的鉴别器可以公平地判别每一个类。其中,提出的类别均衡随机选择算法(class balancing random selection,CBRS)可以解决图像样本类别不均导致少数类识别准确度低的问题。将真实数据中有标签数据按类别随机选择,使得输入的有标签数据每个类别都有相同的数量,然后将训练后参数固定的生成器NetG生成每个类同等数量的假样本输入鉴别器,更新鉴别器NetD保证了鉴别器可以公平地判别所有类;同时BSGAN-GP在鉴别器损失函数中添加了一个额外的梯度惩罚项,使得模型训练更稳定。
结果
2
实验在3个主流数据集上与9种图像识别方法(包含6种半监督方法和3种全监督方法)进行了比较。为了证明对少数类的识别准确度提升,制定了3个数据集的不平衡版本。在Fashion-MNIST 数据集中,相比于基线模型,总体准确率提高了3.281%,少数类识别率提升了7.14%;在MNIST数据集中,相比于基线模型,对应的4个少数类识别率提升了2.68%~7.40%;在SVHN(street view house number)数据集中,相比于基线模型,总体准确率提高了3.515%。同时也在3个数据集中进行了合成图像质量对比以验证CBRS算法的有效性,其少数类合成图像质量以及数量的提升证明了其效果。消融实验评估了所提出模块CBRS与引进模块在网络中的重要性。
结论
2
本文所提出的BSGAN-GP模型能够实现更公平的图像识别以及更高质量的合成图像结果。实验结果开放源代码地址为
https://github.com/zrm0616/BSGAN-GP.git
https://github.com/zrm0616/BSGAN-GP.git
。
Objective
2
Image classification technology has realized high-precision automatic classification and screening of digital images with the improvement of algorithm performance and the development of computer hardware. This technology uses a computer to conduct a quantitative analysis of the image, classifying each area in the image or image into one of several categories to replace human visual interpretation. However, in practice, a large number of training samples and high-quality annotation information are required for high-quality training to obtain high-accuracy classification results. For large-scale image datasets, existing image annotation methods need to be performed manually by industry experts, such as polygon annotation and key point annotation. As a result of the high cost of expert annotation and the difficulty of high-quality annotation, less image data are labeled, thus seriously hindering the development of deep learning in computer vision. To this end, the semi-supervised generative adversarial network (GAN) paradigm is proposed because it can use a large amount of unlabeled data to obtain the distribution characteristics of real samples in the feature space and more accurately determine the classification boundaries. The generative semi-supervised GAN model, such as DCGAN and semi-supervised GAN, can create new samples and increase sample diversity, thus being more widely used in various fields. However, this model is often unstable in adversarial training; especially on an unbalanced dataset, the gradient can easily fall into the trap of predicting most of the data. Image datasets in real-world industrial applications are often category-unbalanced, which is why this imbalance negatively affects the accuracy of mining classifiers. Several recent studies have revealed the effectiveness of GAN, such as DAGAN, BSSGAN, BAGAN, and improve-BAGAN, in alleviating the problem of imbalance. Among them, BAGAN acts as an enhancement method to recover the balance in unbalanced datasets, which can learn useful features from most classes and use these features to generate images for minority classes. However, the experimental results show that its encoder lost many details in the image reconstruction process, making the appearance of similar categories not easy to distinguish in the reconstructed figures. Improve-BAGAN improves the BAGAN, and increasing the gradient penalty makes the model training more stable. Improve-BAGAN is the state-of-the-art achievement of existing supervised learning to solve unbalanced problems, but achieving the expected results of the model requires manual labeling of a sufficient number of samples, which greatly increases the labor and time costs.
Method
2
In this study, a new balanced image recognition model based on semi-supervised GAN is established, enabling the discriminator of semi-supervised GAN to fairly identify every class of unbalanced dataset. The proposed balanced image recognition model BSGAN-GP consists of two components: the category balancing random selection (CBRS) algorithm and the discriminator for adding gradient penalty. For the brand-new CBRS algorithm, we randomly selected the label data in the real data by category so that the number of labels in each class in the input model is consistent, ensuring a balance between the real sample and the generator synthesis sample. Then, we conduct confrontation training, and the generator NetG with fixed parameters generates the same number of false sample input discriminators for each class. We then update the discriminator NetD to ensure that the discriminator can fairly judge all classes to improve the identification accuracy of the minority classes. BSGAN-GP adds an additional gradient penalty item in the discriminator loss function to stabilize the model training. The optimizer selected for the experiment was the Adam algorithm, with the learning rate set to 0.000 2 and the momentum set to (0.5, 0.9). The batch size for all three datasets was 100, where the MNIST and Fashion datasets were set to 1 000, or 100 per class and 5 000 for SVHN, or 500 per class. The experiment used an RTX 4090 GPU and 24 GB of memory. Most studies in the experiment were completed within 4 500 s. For MNIST and Fashion-MNIST, we trained 25 epochs, each of which took 85 s and 108 s, respectively, on our device. For the SVHN, we trained 30 epochs, with each epoch requiring 110 s on our device.
Result
2
The experiment is compared with six semi-supervised methods and three fully supervised parties in the three mainstream datasets. An unbalanced version of the three datasets is developed to prove the improved identification accuracy of a few classes. The experimental indicators include overall accuracy, category recognition rate, confusion matrix, and synthesized images. In the unbalanced Fashion-MNIST, compared with the semi-supervised GAN, the overall accuracy value increased by 3.281%, and the minority class recognition rate increased by 7.14%. In the unbalanced MNIST, the recognition rate of the corresponding four minority classes increased by 2.68% to 7.40% compared with the semi-supervised GAN. In the SVHN, the overall accuracy value increased by 3.515% compared with the semi-supervised GAN. Quality comparison of synthetic images was also conducted in three datasets to verify the effectiveness of the CBRS algorithm, and the improvement of synthetic images on the quantity and quality of a few classes proved its effect. Ablation experiments evaluate the importance of the proposed module CBRS versus the introduced module in the network. The CBRS module improved the overall accuracy of the model by 2% to 3%, and the GP module improved the overall accuracy of the model by 0.8% to 1.8%.
Conclusion
2
In this study, we propose a new algorithm called CBRS to achieve fair recognition of all classes in unbalanced datasets. We introduced a gradient penalty into the discriminator of semi-supervised GANs for more stable training. Experiment results indicate that CBRS can achieve fairer image recognition and higher-quality synthesized image results.
深度学习半监督学习(SSL)生成式对抗网络(GAN)不平衡性图像识别梯度惩罚
deep learningsemi-supervised learning(SSL)generative adversarial network(GAN)unbalanced image recognitiongradient punishment
Arjovsky M, Chintala S and Bottou L. 2017. Wasserstein generative adversarial networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 214-223
Bu K Q, Liu Y and Wang F L. 2023. Process operation performance assessment based on semi-supervised fine-grained generative adversarial network for EFMF. IEEE Transactions on Instrumentation and Measurement, 72: #2505209 [DOI: 10.1109/TIM.2023.3239908http://dx.doi.org/10.1109/TIM.2023.3239908]
Dai Z H, Yang Z L, Yang F, Cohen W W and Salakhutdinov R. 2017. Good semi-supervised learning that requires a bad GAN//Proceedings of the 30th Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 6513-6523
Donahue J, Krähenbühl P and Darrell T. 2017. Adversarial feature learning [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1605.09782.pdfhttp://arxiv.org/pdf/1605.09782.pdf
Dong J H and Lin T. 2019. MarginGAN: adversarial training in semi-supervised learning//Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 10440-10449
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M and Courville A. 2017. Adversarially learned inference [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1606.00704.pdfhttp://arxiv.org/pdf/1606.00704.pdf
Gao Y Q, Kong B Y and Mosalam K M. 2019. Deep leaf-bootstrapping generative adversarial network for structural image data augmentation. Computer-Aided Civil and Infrastructure Engineering, 34(9): 755-773 [DOI: 10.1111/mice.12458http://dx.doi.org/10.1111/mice.12458]
Gao Y Q, Zhai P Y and Mosalam K M. 2021. Balanced semisupervised generative adversarial network for damage assessment from low-data imbalanced-class regime. Computer-Aided Civil and Infrastructure Engineering, 36(9): 1094-1113 [DOI: 10.1111/mice.12741http://dx.doi.org/10.1111/mice.12741]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2020. Generative adversarial networks. Communications of the ACM, 63(11): 139-144 [DOI: 10.1145/3422622http://dx.doi.org/10.1145/3422622]
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V and Courville A. 2017. Improved training of Wasserstein GANs//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 5769-5779
Hong F T, Zhang L H, Shen L and Xu D. 2022. Depth-aware generative adversarial network for talking head video generation//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 3387-3396 [DOI: 10.1109/CVPR52688.2022.00339http://dx.doi.org/10.1109/CVPR52688.2022.00339]
Huang G F and Jafari A H. 2023. Enhanced balancing GAN: minority-class image generation. Neural Computing and Applications, 35(7): 5145-5154 [DOI: 10.1007/S00521-021-06163-8http://dx.doi.org/10.1007/S00521-021-06163-8]
LeCun Y, Bottou L and Bengio Y. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324 [DOI:10.1109/5.726791http://dx.doi.org/10.1109/5.726791]
Li C X, Xu K, Zhu J and Hang B. 2017. Triple generative adversarial nets//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 4091-4101
Luo Y and Lu B L. 2018. EEG data augmentation for emotion recognition using a conditional Wasserstein GAN//Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Honolulu, USA: IEEE: 2535-2538 [DOI: 10.1109/embc.2018.8512865http://dx.doi.org/10.1109/embc.2018.8512865]
Mariani G, Scheidegger F, Istrate R, Bekas C and Malossi C. 2018. BAGAN: data augmentation with balancing GAN [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1803.09655.pdfhttp://arxiv.org/pdf/1803.09655.pdf
Nartey O T, Yang G W, Wu J Z and Asare S K. 2020. Semi-supervised learning for fine-grained classification with self-training. IEEE Access, 8: 2109-2121 [DOI: 10.1109/ACCESS.2019.2962258http://dx.doi.org/10.1109/ACCESS.2019.2962258]
Netzer, Y, Wang, T, Coates, A, Bissacco, A, Wu, B and Ng A. 2011. Reading digits in natural images with unsupervised feature learning//Proceedings of 2011 International Conference on Neural Information Processing Systems. Granada, Spain: 1-5
Odena A. 2016. Semi-supervised learning with generative adversarial networks [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1606.01583.pdfhttp://arxiv.org/pdf/1606.01583.pdf
Odena A, Olah C and Shlens J. 2017. Conditional image synthesis with auxiliary classifier GANs//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 2642-2651
Ouali Y, Hudelot C and Tami M. 2020. An overview of deep semi-supervised learning [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/2006.05278.pdfhttp://arxiv.org/pdf/2006.05278.pdf
Radford A, Metz L and Chintala S. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1511.06434.pdfhttp://arxiv.org/pdf/1511.06434.pdf
Raj A, Ham C, Barnes C, Kim V, Lu J W and Hays J. 2019. Learning to generate textures on 3D meshes//Proceedings of 2019 IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE: 32-38
Ren Z L, Li Q, Cao K J, Li M M, Zhou Y Y and Wang K. 2023. Correction: model performance and interpretability of semi-supervised generative adversarial networks to predict oncogenic variants with unlabeled data. BMC Bioinformatics, 23(S3): #572 [DOI: 10.1186/s12859-023-05357-2http://dx.doi.org/10.1186/s12859-023-05357-2]
Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A and Chen X. 2016. Improved techniques for training GANs//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 2234-2242
Wang L, Sun Y and Wang Z. 2022. CCS-GAN: a semi-supervised generative adversarial network for image classification. The Visual Computer, 38(6): 2009-2021 [DOI: 10.1007/s00371-021-02262-8http://dx.doi.org/10.1007/s00371-021-02262-8]
Xiao H, Rasul K and Vollgraf R. 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms [EB/OL]. [2024-01-07]. http://arxiv.org/pdf/1708.07747.pdfhttp://arxiv.org/pdf/1708.07747.pdf
Xu Z Y, Luo J W and Xiong Z H. 2022. scSemiGAN: a single-cell semi-supervised annotation and dimensionality reduction framework based on generative adversarial network. Bioinformatics, 38(22): 5042-5048 [DOI: 10.1093/BIOINFORMATICS/BTAC652http://dx.doi.org/10.1093/BIOINFORMATICS/BTAC652]
Yi Z L, Zhang H, Tan P and Gong M L. 2017. DualGAN: unsupervised dual learning for image-to-image translation//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2868-2876 [DOI: 10.1109/ICCV.2017.310http://dx.doi.org/10.1109/ICCV.2017.310]
Zhang Y C, Ren Z H and Zhou S H. 2020. An intelligent fault diagnosis for rolling bearing based on adversarial semi-supervised method. IEEE Access, 8: 149868-149877 [DOI: 10.1109/ACCESS.2020.3016314http://dx.doi.org/10.1109/ACCESS.2020.3016314]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2242-2251 [DOI: 10.1109/ICCV.2017.244http://dx.doi.org/10.1109/ICCV.2017.244]
Zhu J Y, Zhang Z T, Zhang C K, Wu J J, Torralba A, Tenenbaum J B and Freeman W T. 2018. Visual object networks: image generation with disentangled 3D representation//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 118-129
相关作者
相关机构