多域特征混合增强对抗样本迁移性方法

万鹏; 胡聪; 吴小俊

doi:10.11834/jig.230895

图像分析和识别 | 浏览量 : 0 下载量: 1 CSCD: 0

PDF
导出
分享
收藏
专辑

多域特征混合增强对抗样本迁移性方法
Multi-domain feature mixup boosting adversarial examples transferability method
2024年29卷第12期页码：3670-3683
纸质出版日期： 2024-12-16 ，
DOI： 10.11834/jig.230895
稿件说明：

移动端阅览

万鹏，胡聪，吴小俊. 2024. 多域特征混合增强对抗样本迁移性方法. 中国图象图形学报， 29(12):3670-3683

Wan Peng， Hu Cong， Wu Xiaojun. 2024. Multi-domain feature mixup boosting adversarial examples transferability method. Journal of Image and Graphics， 29(12):3670-3683
万鹏，胡聪，吴小俊. 2024. 多域特征混合增强对抗样本迁移性方法. 中国图象图形学报， 29(12):3670-3683 DOI： 10.11834/jig.230895.

Wan Peng， Hu Cong， Wu Xiaojun. 2024. Multi-domain feature mixup boosting adversarial examples transferability method. Journal of Image and Graphics， 29(12):3670-3683 DOI： 10.11834/jig.230895.

摘要

目的

对抗样本对深度神经网络（deep neural network，DNN）的安全性构成了重大威胁，此现象引起了广泛的关注。当前许多黑盒对抗攻击方法普遍存在一个问题：它们仅在空间域或频率域单一域中进行对抗攻击，生成的对抗样本无法充分利用目标模型在其他域中的潜在脆弱性，导致对抗样本的迁移性不佳。为此，提出一种多域特征混合增强对抗样本迁移性方法（multiple domain feature mixup，MDFM），以提高对抗样本在黑盒场景下的攻击成功率。

方法

使用离散余弦变换将图像从空间域变换到频率域，存储原始图像的清洁频率域特征。然后利用逆离散余弦变换将图像转换回空间域。之后利用替代模型提取图像的清洁空间域特征。在生成对抗样本的过程中，通过在频率域和空间域中进行特征混合，最终生成迁移性更好的对抗样本。

结果

在CIFAR-10和ImageNet数据集上进行了广泛实验，并对比了多种不同的攻击方法。在CIFAR-10数据集上，对不同模型的平均攻击成功率达到了89.8%。在ImageNet数据集上，分别使用ResNet-50和Inception-v3作为替代模型时，在不同的DNN模型上的平均攻击成功率达到75.9%和40.6%；当分别使用ResNet-50和adv-ResNet-50作为替代模型并在基于Transformer的模型上进行测试时，平均攻击成功率为32.3%和59.4%，超越了目前先进的黑盒对抗攻击方法。

结论

多域特征混合增强对抗样本迁移性方法通过在空间域和频率域上进行特征混合，促使对抗样本利用多域中广泛的特征来克服清洁特征带来的干扰，从而提高对抗样本的迁移性。本文的代码可以在

https://github.com/linghuchong111da/MDFM

获取。

Abstract

Objective

Deep neural networks （DNNs） have witnessed widespread application across diverse domains and demonstrated their remarkable performance， particularly in the realm of computer vision. However， adversarial examples pose a significant security threat to DNNs. Adversarial attacks are categorized into white-box and black-box attacks based on their access to the target model’s architecture and parameters. On the one hand， white-box attacks utilize techniques， such as backpropagation， to attain high attack success rates by leveraging knowledge about the target model. On the other hand， black-box attacks generate adversarial examples on an alternative model before launching attacks on the target model. Despite their alignment with real-world scenarios， black-box attacks generally exhibit low success rates due to the limited knowledge about the target model. The existing methods for addressing adversarial attacks typically focus on perturbations in the spatial domain or the influence of frequency information in images yet neglect the importance of the other domain. The spatial and frequency domain information of images are crucial for model recognition. Therefore， considering only one domain leads to insufficient generalization of the generated adversarial examples. This paper addresses this gap by introducing a novel black-box adversarial attack method called multi-domain feature mixup （MDFM）， which aims to enhance the transferability of adversarial examples by considering both domains.

Method

In the initial iteration， discrete cosine transform is employed to convert the original images from the spatial domain to the frequency domain and to store the clean frequency domain features of the original images. Subsequently， inverse discrete cosine transform is employed to transform the images from the frequency domain back to the spatial domain. An alternative model is then applied to extract the clean spatial domain features of the original images. In subsequent iterations， the perturbed images are transitioned from the spatial domain to the frequency domain. The preserved clean features are then arranged based on the images， thus enabling the mixing of these images with their own clean features or those of other images. The frequency domain features of the perturbed and clean images are mixed. Random mixing ratios are applied within the corresponding channels of the image to introduce arbitrary variations that are influenced by clean frequency domain features， thus instigating diverse interference effects. The mixed features are then reconverted to the spatial domain where they undergo further mixing with the clean spatial domain features during the alternative model processing. Shuffle and random channel mixing ratios are also implemented， and adversarial examples are ultimately generated.

Result

Extensive experiments are conducted on the CIFAR-10 and ImageNet datasets. On the CIFAR-10 dataset， ResNet-50 is utilized as the surrogate model to generate adversarial examples， and MDFM is tested on the VGG-16， ResNet-18， MobileNet-v2， Inception-v3， and DenseNet-121 ensemble models trained under different defense configurations to evaluate its performance in addressing advanced black-box adversarial attack methods， such as VT， Admix， and clean feature mixup （CFM）. Experimental results demonstrate that MDFM achieves the highest attack success rates across these models， reaching 89.8% on average. Compared with the state-of-the-art CFM method， MDFM achieves a 0.5% improvement in its average attack success rate. On the ImageNet dataset， ResNet-50 and Inception-v3 are employed as surrogate models， and MDFM is tested on the VGG-16， ResNet-18， ResNet-50， DenseNet-121， Xception， MobileNet-v2， EfficientNet-B0， Inception ResNet-v2， Inception-v3， and Inception-v4 target models. When ResNet-50 serves as the surrogate model， the experimental results indicate that MDFM attains the highest attack success rates across all target models， surpassing the other attack methods. Compared with CFM， MDFM achieves a 1.6% higher average attack success rate. This improvement reaches 3.6% when tested on the MobileNet-v2 model. When Inception-v3 is employed as the surrogate model， MDFM consistently achieves the highest attack success rates across all nine models， surpassing the other methods. MDFM consistently outperforms CFM on all models， demonstrating a maximum improvement of 2.5% in terms of attack success rate. This success rate reaches 40.6%， which is 1.4% higher than the success rate achieved by the state-of-the-art CFM. To further validate the effectiveness of MDFM， this model is tested on adv-ResNet-50 and five Transformer-based models. ResNet-50 and adv-ResNet-50 are used as surrogate models in these tests. When ResNet-50 serves as the surrogate model， MDFM achieves the highest attack success rates across all five models， with an average improvement of 1.5% over CFM. The most significant improvement is observed on the Pit model， where MDFM achieves a 2.8% improvement in its attack success rate， which surpasses that of CFM by 1.5%. Meanwhile， when adv-ResNet-50 is employed as the surrogate model， MDFM achieves an average attack success rate of 59.4%， surpassing the other methods. The ConVit model exhibits a 1.9% improvement over CFM， and its average attack success rate surpasses CFM by 0.8%.

Conclusion

This paper introduces the novel MDFM that is specifically designed for addressing adversarial attacks in black-box scenarios. MDFM mixes clean features across multiple domains， prompting adversarial examples to leverage a diverse set of features to overcome the interference caused by clean features. As a result， highly diverse adversarial examples are generated， and their transferability is enhanced.

关键词

对抗样本频率域特征混合黑盒对抗攻击深度神经网络（DNN）

Keywords

adversarial examplefrequency domainfeature mixupblack-box adversarial attackdeep neural network （DNN）

references

Bahmei B， Birmingham E and Arzanpour S. 2022. CNN-RNN and data augmentation using deep convolutional generative adversarial network for environmental sound classification. IEEE Signal Processing Letters， 29： 682-686 ［DOI： 10.1109/LSP.2022.3150258http://dx.doi.org/10.1109/LSP.2022.3150258］

Byun J， Cho S， Kwon M J， Kim H S and Kim C. 2022. Improving the transferability of targeted adversarial examples through object-based diverse input//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 15223-15232 ［DOI： 10.1109/cvpr52688.2022.01481http://dx.doi.org/10.1109/cvpr52688.2022.01481］

Byun J， Kwon M J， Cho S， Kim Y and Kim C. 2023. Introducing competition to boost the transferability of targeted adversarial examples through clean feature mixup//Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver， Canada： IEEE： 24648-24657 ［DOI： 10.1109/cvpr52729.2023.02361http://dx.doi.org/10.1109/cvpr52729.2023.02361］

Carlini N and Wagner D. 2017. Towards evaluating the robustness of neural networks//2017 IEEE Symposium on Security and Privacy （SP）. San Jose， USA： IEEE： 39-57 ［DOI： 10.1109/SP.2017.49http://dx.doi.org/10.1109/SP.2017.49］

Chollet F. 2017. Xception： deep learning with depthwise separable convolutions//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 1800-1807 ［DOI： 10.1109/cvpr.2017.195http://dx.doi.org/10.1109/cvpr.2017.195］

Dong Y P， Liao F Z， Pang T Y， Su H， Zhu J， Hu X L and Li J G. 2018. Boosting adversarial attacks with momentum//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 9185-9193 ［DOI： 10.1109/CVPR.2018.00957http://dx.doi.org/10.1109/CVPR.2018.00957］

Dong Y P， Pang T Y， Su H and Zhu J. 2019. Evading defenses to transferable adversarial examples by translation-invariant attacks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 4307-4316 ［DOI： 10.1109/cvpr.2019.00444http://dx.doi.org/10.1109/cvpr.2019.00444］

Duan R J， Chen Y F， Niu D T， Yang Y， Qin A K and He Y. 2021. AdvDrop： adversarial attack to DNNs by dropping information//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 7486-7495 ［DOI： 10.1109/iccv48922.2021.00741http://dx.doi.org/10.1109/iccv48922.2021.00741］

Goodfellow I J， Shlens J and Szegedy C. 2015. Explaining and harnessing adversarial examples ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1412.6572.pdfhttps://arxiv.org/pdf/1412.6572.pdf

Guo C， Frank J S and Weinberger K Q. 2019. Low frequency adversarial perturbation ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1809.08758.pdfhttps://arxiv.org/pdf/1809.08758.pdf

He K M， Zhang X Y， Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vega， USA： IEEE： 770-778 ［DOI： 10.1109/cvpr.2016.90http://dx.doi.org/10.1109/cvpr.2016.90］

Huang G， Liu Z， Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， USA： IEEE： 2261-2269 ［DOI： 10.1109/cvpr.2017.243http://dx.doi.org/10.1109/cvpr.2017.243］

Jia S， Ma C， Yao T P， Yin B J， Ding S H and Yang X K. 2022. Exploring frequency adversarial attacks for face forgery detection//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， USA： IEEE： 4093-4102 ［DOI： 10.1109/cvpr52688.2022.00407http://dx.doi.org/10.1109/cvpr52688.2022.00407］

Kariyappa S and Qureshi M K. 2019. Improving adversarial robustness of ensembles with diversity training ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1901.09981v1.pdfhttps://arxiv.org/pdf/1901.09981v1.pdf

Kurakin A， Goodfellow I and Bengio S. 2017. Adversarial examples in the physical world ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1607.02533.pdfhttps://arxiv.org/pdf/1607.02533.pdf

Li Y， Lao L J， Cui Z， Shan S G and Yang J. 2022. Graph jigsaw learning for cartoon face recognition. IEEE Transactions on Image Processing， 31： 3961-3972 ［DOI： 10.1109/TIP.2022.3177952http://dx.doi.org/10.1109/TIP.2022.3177952］

Lin J D， Song C B， He K， Wang L W and Hopcroft J E. 2020. Nesterov accelerated gradient and scale invariance for adversarial attacks ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1908.06281.pdfhttps://arxiv.org/pdf/1908.06281.pdf

Long Y Y， Zhang Q L， Zeng B H， Gao L L， Liu X L， Zhang J and Song J K. 2022. Frequency domain model augmentation for adversarial attack//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv， Israel： Springer： 549-566 ［DOI： 10.1007/978-3-031-19772-7_32http://dx.doi.org/10.1007/978-3-031-19772-7_32］

Moosavi-Dezfooli S M， Fawzi A and Frossard P. 2016. DeepFool： a simple and accurate method to fool deep neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas， USA： IEEE： 2574-2582 ［DOI： 10.1109/CVPR.2016.282http://dx.doi.org/10.1109/CVPR.2016.282］

Pang T Y， Xu K， Du C， Chen N and Zhu J. 2019. Improving adversarial robustness via promoting ensemble diversity ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1901.08846.pdfhttps://arxiv.org/pdf/1901.08846.pdf

Qian Y G， He S K， Zhao C Y， Sha J Q， Wang W and Wang B. 2023. LEA2： a lightweight ensemble adversarial attack via non-overlapping vulnerable frequency regions//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision. Paris， France： IEEE： 4487-4498 ［DOI： 10.1109/iccv51070.2023.00416http://dx.doi.org/10.1109/iccv51070.2023.00416］

Sanchez V， Garcia P， Peinado A M， Segura J C and Rubio A J. 1995. Diagonalizing properties of the discrete cosine transforms. IEEE transactions on Signal Processing， 43（11）： 2631-2641 ［DOI： 10.1109/78.482113http://dx.doi.org/10.1109/78.482113］

Sharma Y， Ding G W and Brubaker M. 2019. On the effectiveness of low frequency perturbations ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1903.00073.pdfhttps://arxiv.org/pdf/1903.00073.pdf

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2023-12-20］. https://arxiv.org/pdf/1409.1556.pdfhttps://arxiv.org/pdf/1409.1556.pdf

Szegedy C， Ioffe S， Vanhoucke V and Alemi A. 2017. Inception-v4， inception-resnet and the impact of residual connections on learning//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco， USA： AAAI： 4278-4284 ［DOI： 10.1609/aaai.v31i1.11231http://dx.doi.org/10.1609/aaai.v31i1.11231］

Tan M X and Le Q. 2019. EfficientNet： rethinking model scaling for convolutional neural networks//Proceedings of the 36th International Conference on Machine Learning. Long Beach， USA： ICML： 6105-6114

Wang X S and He K. 2021. Enhancing the transferability of adversarial attacks through variance tuning//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville， USA： IEEE： 1924-1933 ［DOI： 10.1109/cvpr46437.2021.00196http://dx.doi.org/10.1109/cvpr46437.2021.00196］

Wang X S， He X R， Wang J D and He K. 2021. Admix： enhancing the transferability of adversarial attacks//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal， Canada： IEEE： 16138-16147 ［DOI： 10.1109/iccv48922.2021.01585http://dx.doi.org/10.1109/iccv48922.2021.01585］

Xie C H， Zhang Z S， Zhou Y Y， Bai S， Wang J Y， Ren Z and Yuille A L. 2019. Improving transferability of adversarial examples with input diversity//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 2725-2734 ［DOI： 10.1109/cvpr.2019.00284http://dx.doi.org/10.1109/cvpr.2019.00284］

Yang H R， Zhang J Y， Dong H L， Inkawhich N， Gardner A， Touchet A， Wilkes W， Berry， Hai H and Li H. 2020. DVERGE： diversifying vulnerabilities for enhanced robust generation of ensembles//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver， Canada： Curran Associates Inc.： 5505-5515

Zhang Y P， Zheng W Z， Zhu Z， Huang G， Lu J W and Zhou J. 2023. A simple baseline for multi-camera 3D object detection//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington， USA： AAAI： 3507-3515 ［DOI： 10.1609/aaai.v37i3.25460http://dx.doi.org/10.1609/aaai.v37i3.25460］

Zhao J J， Wang J W and Wu J F. 2023. Adversarial attack method identification model based on multi-factor compression error. Journal of Image and Graphics， 28（3）： 850-863

赵俊杰，王金伟，吴俊凤. 2023. 基于多质量因子压缩误差的对抗样本攻击方法识别. 中国图象图形学报， 28（3）： 850-863 ［DOI： 10.11834/jig.220516http://dx.doi.org/10.11834/jig.220516］

Zou J H， Pan Z S， Qiu J Y， Liu X， Rui T and Li W. 2020. Improving the transferability of adversarial examples with resized-diverse-inputs， diversity-ensemble and region fitting//Proceedings of the 16th European Conference on Computer Vision. Glasgow， UK： Springer： 563-579 ［DOI： 10.1007/978-3-030-58542-6_34http://dx.doi.org/10.1007/978-3-030-58542-6_34］

文章被引用时，请邮件提醒。

提交

基于多质量因子压缩误差的对抗样本攻击方法识别

CNN结合Transformer的深度伪造高效检测