基于归一化流的多模态多尺度工业场景缺陷检测
Multimodal multiscale industrial anomaly detection via flows
- 2025年30卷第2期 页码:451-466
纸质出版日期: 2025-02-16
DOI: 10.11834/jig.240183
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-02-16 ,
移动端阅览
曲海成, 林俊杰. 2025. 基于归一化流的多模态多尺度工业场景缺陷检测. 中国图象图形学报, 30(02):0451-0466
Qu Haicheng, Lin Junjie. 2025. Multimodal multiscale industrial anomaly detection via flows. Journal of Image and Graphics, 30(02):0451-0466
目的
2
工业缺陷检测是现代工业质量控制中至关重要的一环,针对工业多模态缺陷检测场景下,捕捉不同形状大小、在RGB图像上感知度低的缺陷,以及减少单模态原始特征空间内存在的噪声对多模态信息交互的干扰的挑战,提出了一种基于归一化流的多模态多尺度缺陷检测方法。
方法
2
首先,使用Vision Transformer和Point Transformer对RGB图像和3D点云两个模态的信息提取第1、3、11块的特征构建特征金字塔,保留低层次特征的空间信息助力缺陷定位任务,并提高模型对不同形状大小缺陷的鲁棒性;其次,为了简化多模态交互,使用过点特征对齐算法将3D点云特征对齐至RGB图像所在平面,通过构建对比学习矩阵的方式实现无监督多模态特征融合,促进不同模态之间信息的交互;此外,通过设计代理任务的方式将信息瓶颈机制扩展至无监督,并在尽可能保留原始信息的同时,减少噪声干扰得到更充分有力的多模态表示;最后,使用多尺度归一化流结构捕捉不同尺度的特征信息,实现不同尺度特征之间的交互。
结果
2
本文方法在MVTec-3D AD数据集上进行性能评估,实验结果显示Detection AUCROC(area under the curve of the receiver operating characteristic)指标达到93.3%,Segmentation AUPRO(area under the precision-recall overlap)指标达到96.1%,Segmentation AUCROC指标达到98.8%,优于大多数现有的多模态缺陷检测方法。
结论
2
本文方法对于不同形状大小、在RGB图像上感知度低的缺陷有较好的检测效果,不但减少了原始特征空间内噪声对多模态表示的影响,并且对不同形状大小的缺陷具有一定的泛化能力,较好地满足了现代工业对于缺陷检测的要求。
Objective
2
Defect detection stands as a fundamental cornerstone in modern industrial quality control frameworks. As industries have advanced, the array of defect types has become increasingly diverse. Some defects present formidable challenges, as they are scarcely perceptible when examined using individual RGB images. This approach necessitates additional information from complementary modalities to aid in detection. Consequently, conventional deep learning methods, which rely solely on single modal data for defect identification, have proven inadequate for meeting the dynamic demands of contemporary industrial environments. Here, an innovative approach is proposed to address the nuanced challenges inherent in multimodal defect detection scenarios prevalent in modern industries, where defects vary considerably in shape and size and often exhibit low perceptibility within individual modalities. The proposed method can integrate a novel multimodal multiscale defect detection framework grounded in the principles of normalizing flows once the inherent noise interference within single modal feature spaces is addressed and the synergies between multimodal information are harnessed.
Method
2
The proposed method is structured into four main components: feature extraction, unsupervised feature fusion, an information bottleneck mechanism, and multiscale normalizing flow. First, in the feature extraction stage, features at different levels are assumed to contain varying degrees of spatial and semantic information. Low-level features contain more spatial information, whereas high-level features convey richer semantic information. Given the emphasis on spatial detail information in pixel-level defect localization tasks. Vision transformers and point transformers were used to extract features from RGB images and 3D point clouds, with a focus on blocks 1, 3, and 11, to obtain multimodal representations at different levels. The representations are subsequently fused and structured into a feature pyramid. This approach not only preserves spatial information from low-level features to aid in defect localization but also enhances the model’s robustness to defects of varying shapes and sizes. Second, in the unsupervised feature fusion stage, the multimodal interaction was streamlined by employing the point feature alignment technique to align 3D point cloud features with the RGB image plane. Unsupervised multimodal feature fusion was achieved by constructing a contrastive learning matrix, thereby facilitating interaction between different modalities. Moreover, in the information bottleneck mechanism st
age, a proxy task is designed to extend the information bottleneck mechanism to unsupervised settings. The aim is to obtain a more comprehensive and robust multimodal representation by minimizing noise interference within single-modal raw feature spaces while preserving the original information as much as possible. Finally, in the multiscale normalizing flow stage, the structure uses parallel flows to capture feature information at different scales. Through the fusion of these flows, interactions between features at various scales are realized. Additionally, an innovative approach for anomaly scoring is employed, wherein the average of the Top-
K
values in the anomaly score map replaces traditional methods such as those that use the mean or maximum values. This approach yields the final defect detection results.
Result
2
The proposed method is evaluated on the MVTec-3D AD dataset. This dataset is meticulously curated, encompassing 10 distinct categories of industrial products, with a comprehensive collection of 2 656 training samples and 1 137 testing samples. Each category is meticulously segmented into subclasses, delineated by the nature of the defects. The proposed method was experimentally validated, and the results demonstrated its exceptional performance. An AUCROC of 93.3%, a segmentation AUPRO of 96.1%, and a segmentation AUCROC of 98.8% were achieved. These metrics not only reflect the method’s effectiveness but also have advantages over the majority of existing multimodal defect detection methodologies. Moreover, visualizations were conducted on selected samples, comparing the detection outcomes using only RGB images against those utilizing RGB images in conjunction with 3D point clouds. The latter combination has revealed defects that remain elusive when relying solely on RGB imagery. This empirical evidence firmly establishes the advantage of integrating data from both modalities, as posited in the hypothesis of this study. The ablation studies conducted provide additional insights into the efficacy of the proposed method. The introduction of an information bottleneck resulted in incremental improvements across all three metrics: 1.4% in the detection AUCROC, 2.1% in the segmentation AUPRO, and 3.5% in the segmentation AUCROC. The integration of a multiscale normalizing flow further enhanced the performance, with gains of 2.5%, 3.6%, and 1.6% across the respective metrics. These findings are indicative of the substantial contributions of both the information bottleneck and the multiscale normalizing flow to the overall performance of the defect detection framework used in this work.
Conclusion
2
The main contributions of this study are as follows: unsupervised feature fusion was employed to encourage information exchange between different modalities. An information bottleneck was introduced into the feature fusion module to mitigate the impact of noise in the original feature space of single modalities on multimodal interaction. Additionally, multimodal representations were utilized at different levels to construct feature pyramids, addressing the issue of the poor performance of previous flow-based methods in handling defects of varying scales. The proposed method demonstrates promising detection performance across defects of diverse shapes and sizes, including those with low perceptibility on RGB images. When the impact of noise within the original feature space on the multimodal representation is mitigated, the proposed approach can not only improve the robustness of the method but also enhance its ability to generalize the defects of varying characteristics. This approach effectively aligns with the stringent demands of modern industry for accurate and reliable defect detection methodologies.
Albelwi S . 2022 . Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging . Entropy , 24 ( 4 ): # 551 [ DOI: 10.3390/e24040551 http://dx.doi.org/10.3390/e24040551 ]
Bergmann P , Jin X , Sattlegger D and Steger C . 2021 . The MVTec 3D-AD dataset for unsupervised 3D anomaly detection and localization [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2112.09045.pdf https://arxiv.org/pdf/2112.09045.pdf
Bergmann P and Sattlegger D . 2022 . Anomaly detection in 3D point clouds using deep geometric descriptors [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2202.11660.pdf https://arxiv.org/pdf/2202.11660.pdf
Caron M , Touvron H , Misra I , Jegou H , Mairal J , Bojanowski P and Joulin A . 2021 . Emerging properties in self-supervised vision transformers // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV) . Montreal, Canada : EEEE: 9630 - 9640 [ DOI: 10.1109/ICCV48922.2021.00951 http://dx.doi.org/10.1109/ICCV48922.2021.00951 ]
Chang A X , Funkhouser T , Guibas L , Hanrahan P , Huang Q X , Li Z M , Savarese S , Savva M , Song S R , Su H , Xiao J X , Yi L and Yu F . 2015 . ShapeNet: an information-rich 3D model repository [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/1512.03012.pdf https://arxiv.org/pdf/1512.03012.pdf
Deng H Q and Li X Y . 2022 . Anomaly detection via reverse distillation from one-class embedding // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 9727 - 9736 [ DOI: 10.1109/CVPR52688.2022.00951 http://dx.doi.org/10.1109/CVPR52688.2022.00951 ]
Deng J , Dong W , Socher R , Li L J , Li K and Li F F . 2009 . ImageNet: a large-scale hierarchical image database // Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition . Miami, USA : IEEE: 248 - 255 [ DOI: 10.1109/CVPR.2009.5206848 http://dx.doi.org/10.1109/CVPR.2009.5206848 ]
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X H , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J and Houlsby N . 2021 . An image is worth 16 × 16 words: transformers for image recognition at scale [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2010.11929.pdf https://arxiv.org/pdf/2010.11929.pdf
Gudovskiy D , Ishizaka S and Kozuka K . 2022 . CFLOW-AD: real-time unsupervised anomaly detection with localization via conditional normalizing flows // Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) . Waikoloa, USA : IEEE: 1819 - 1828 [ DOI: 10.1109/wacv51458.2022.00188 http://dx.doi.org/10.1109/wacv51458.2022.00188 ]
Guo X B , Kot A and Kong A W K . 2023 . Pace-adaptive and noise-resistant contrastive learning for multimodal feature fusion . IEEE Transactions on Multimedia , 25 : 9437 - 9448 [ DOI: 10.1109/TMM.2023.3252270 http://dx.doi.org/10.1109/TMM.2023.3252270 ]
Horwitz E and Hoshen Y . 2022 . An empirical investigation of 3D anomaly detection and segmentation [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2203.05550v2.pdf https://arxiv.org/pdf/2203.05550v2.pdf
Zhao Z B , Jiang Z G , Li Y X , Qi Y C , Zhai Y J , Zhao W Q and Zhang K . 2021 . Overview of visual defect detection for transmission line components . Journal of Image and Graphics , 26 ( 11 ): 2545 - 2560
赵振兵 , 蒋志钢 , 李延旭 , 戚银城 , 翟勇杰 , 赵文清 , 张珂 . 2021 . 输电线路部件视觉缺陷检测综述 . 中国图象图形学报 , 26 ( 11 ): 2545 - 2560 [ DOI: 10.11834/jig.200689 http://dx.doi.org/10.11834/jig.200689 ]
Liu H Q , Xu X D , Li E H , Zhang S C and Li X L . 2023 . Anomaly detection with representative neighbors . IEEE Transactions on Neural Networks and Learning Systems , 34 ( 6 ): 2831 - 2841 [ DOI: 10.1109/tnnls.2021.3109898 http://dx.doi.org/10.1109/tnnls.2021.3109898 ]
Mai S , Zeng Y and Hu H F . 2023 . Multimodal information bottleneck: learning minimal sufficient unimodal and multimodal representations . IEEE Transactions on Multimedia , 25 : 4121 - 4134 [ DOI: 10.1109/TMM.2022.3171679 http://dx.doi.org/10.1109/TMM.2022.3171679 ]
Pang G S , Shen C H , Cao L B and Van Den Hengel A . 2021 . Deep learning for anomaly detection: a review . ACM Computing Surveys , 54 ( 2 ): # 38 [ DOI: 10.1145/3439950 http://dx.doi.org/10.1145/3439950 ]
Pang Y T , Wang W X , Tay F E H , Liu W , Tian Y H and Yuan L . 2022 . Masked autoencoders for point cloud self-supervised learning // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 604 - 621 [ DOI: 10.1007/978-3-031-20086-1_35 http://dx.doi.org/10.1007/978-3-031-20086-1_35 ]
Qi C R , Yi L , Su H and Guibas L J . 2017 . PointNet++: deep hierarchical feature learning on point sets in a metric space [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/1706.02413.pdf https://arxiv.org/pdf/1706.02413.pdf
Rani A , Ortiz-Arroyo D and Durdevic P . 2024 . Advancements in point cloud-based 3D defect detection and classification for industrial systems: a comprehensive survey [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2402.12923.pdf https://arxiv.org/pdf/2402.12923.pdf
Reiss T , Cohen N , Bergman L and Hoshen Y . 2021 . PANDA: adapting pretrained features for anomaly detection and segmentation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Nashville, USA : IEEE: 2805 - 2813 [ DOI: 10.1109/cvpr46437.2021.00283 http://dx.doi.org/10.1109/cvpr46437.2021.00283 ]
Rudolph M , Wandt B and Rosenhahn B . 2021 . Same same but DifferNet: semi-supervised defect detection with normalizing flows // Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision . Waikoloa, USA : IEEE: 1906 - 1915 [ DOI: 10.1109/wacv48630.2021.00195 http://dx.doi.org/10.1109/wacv48630.2021.00195 ]
Rudolph M , Wehrbein T , Rosenhahn B and Wandt B . 2023 . Asymmetric student-teacher networks for industrial anomaly detection // Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa, USA : IEEE: 2591 - 2601 [ DOI: 10.1109/WACV56688.2023.00262 http://dx.doi.org/10.1109/WACV56688.2023.00262 ]
Schlegl T , Seeböck P , Waldstein S M , Langs G and Schmidt-Erfurth U . 2019 . f-AnoGAN: fast unsupervised anomaly detection with generative adversarial networks . Medical Image Analysis , 54 : 30 - 44 [ DOI: 10.1016/j.media.2019.01.010 http://dx.doi.org/10.1016/j.media.2019.01.010 ]
Tang B , Kong J Y and Wu S Q . 2017 . Review of surface defect detection based on machine vision . Journal of Image and Graphics , 57 ( 1 ): 47 - 56
汤勃 , 孔建益 , 伍世虔 . 2017 . 机器视觉表面缺陷检测综述 . 中国图象图形学报 , 22 ( 12 ): 1640 - 1663 [ DOI: 10.11834/jig.160623 http://dx.doi.org/10.11834/jig.160623 ]
Tishby N , Pereira F C and Bialek W . 2000 . The information bottleneck method [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/physics/0004057.pdf https://arxiv.org/pdf/physics/0004057.pdf
Van Den Oord A , Li Y Z and Vinyals O . 2019 . Representation learning with contrastive predictive coding [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/1807.03748.pdf https://arxiv.org/pdf/1807.03748.pdf
van den Oord A , Vinyals O and Kavukcuoglu K . 2018 . Neural discrete representation learning [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/1711.00937.pdf https://arxiv.org/pdf/1711.00937.pdf
Wang Q , Ma Y , Zhao K and Tian Y J . 2022 . A comprehensive survey of loss functions in machine learning . Annals of Data Science , 9 ( 2 ): 187 - 212 [ DOI: 10.1007/s40745-020-00253-5 http://dx.doi.org/10.1007/s40745-020-00253-5 ]
Wang Y , Peng J L , Zhang J N , Yi R , Wang Y B and Wang C J . 2023 . Multimodal industrial anomaly detection via hybrid fusion // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Vancouver, Canada : IEEE: 8032 - 8041 [ DOI: 10.1109/CVPR52729.2023.00776 http://dx.doi.org/10.1109/CVPR52729.2023.00776 ]
Yang G D , Huang X , Hao Z K , Liu M Y , Belongie S and Hariharan B . 2019 . PointFlow: 3D point cloud generation with continuous normalizing flows // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 4540 - 4549 [ DOI: 10.1109/ICCV.2019.00464 http://dx.doi.org/10.1109/ICCV.2019.00464 ]
Yu J W , Zheng Y , Wang X , Li W , Wu Y S , Zhao R and Wu L W . 2021 . FastFlow: unsupervised anomaly detection and localization via 2D normalizing flows [EB/OL]. [ 2024-04-07 ]. https://arxiv.org/pdf/2111.07677.pdf https://arxiv.org/pdf/2111.07677.pdf
Zhou C and Paffenroth R C . 2017 . Anomaly detection with robust deep autoencoders // Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Halifax, Canada : ACM: 665 - 674 [ DOI: 10.1145/3097983.3098052 http://dx.doi.org/10.1145/3097983.3098052 ]
Zhou Y X , Xu X , Song J K , Shen F M and Shen H T . 2024 . MSFlow: multiscale flow-based framework for unsupervised anomaly detection . IEEE Transactions on Neural Networks and Learning Systems , 2024 : 1 - 14 [ DOI: 10.1109/TNNLS.2023.3344118 http://dx.doi.org/10.1109/TNNLS.2023.3344118 ]
Zhu Y , Xu R D , An H , Tao C B and Lu K . 2023 . Anti-noise 3D object detection of multimodal feature attention fusion based on PV-RCNN . Sensors , 23 ( 1 ): # 233 [ DOI: 10.3390/s23010233 http://dx.doi.org/10.3390/s23010233 ]
相关文章
相关作者
相关机构