面向计算机视觉的数据生成与应用研究进展综述

马愈卓; 张永飞; 贾伟; 刘家瑛; 甘甜; 杨文瀚; 卓君宝; 刘武; 马惠敏

doi:10.11834/jig.250085

浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

面向计算机视觉的数据生成与应用研究进展综述
Recent Advances on Data Generation and Its Applications on Computer Vision
2025年页码：1-81
收稿日期：2025-02-28，

修回日期：2025-03-12，

录用日期：2025-03-24，

网络出版日期：2025-03-26，
DOI： 10.11834/jig.250085
稿件说明：

移动端阅览

马愈卓, 张永飞, 贾伟, 刘家瑛, 甘甜, 杨文瀚, 卓君宝, 刘武, 马惠敏. 面向计算机视觉的数据生成与应用研究进展综述[J/OL]. 中国图象图形学报, 2025,1-81. DOI： 10.11834/jig.250085.

Ma Yuzhuo, Zhang Yongfei, Jia Wei, Liu Jiaying, Gan Tian, Yang Wenhan, Zhuo Junbao, Liu Wu, Ma Huimin. Recent Advances on Data Generation and Its Applications on Computer Vision[J/OL]. Journal of image and graphics, 2025, 1-81. DOI： 10.11834/jig.250085.

摘要

大规模图像和视频数据集是驱动计算机视觉算法发展的核心要素。面向计算机视觉任务，构建大规模图像和视频数据集是一项重要但复杂的任务。基于生成对抗网络和扩散模型等数据生成方法可以可控的生成大规模、多样性的图像和视频数据，有效替代或弥补真实图像和视频数据集，为计算机视觉技术领域的发展提供了新的动力。本综述论文在对面向计算机视觉的图像和视频数据生成与应用背景简介的基础上，首先从以几何变换等为代表的传统数据增广和生成、以虚拟引擎和神经辐射场等为代表的基于三维渲染的数据生成方法、以生成对抗网络和扩散模型等为代表的基于深度生成模型的生成方法等三方面系统调研了典型的图像和视频数据生成技术与模型；其次，梳理了典型的图像和视频数据生成技术与模型在图像增强、目标检测跟踪与姿态动作识别等个体分析、基于图像和视频的生物特征识别、人员计数与人群行为分析等群体行为分析、自动驾驶、视频生成、具身智能等典型计算机视觉相关任务中的应用；最后分析了面向计算机视觉的数据生成与应用中存在的问题，并展望了未来发展趋势，以期促进图像和视频数据生成及计算机视觉技术的发展。

Abstract

Large-scale image and video datasets are indispensable for the development of computer vision algorithms， providing the necessary resources to train and evaluate various models. Constructing such datasets for different computer vision tasks is crucial but complex， as it involves substantial challenges in data collection， annotation， and ensuring data diversity. Traditionally， acquiring large， high-quality image and video datasets has been a resource-intensive task， requiring manual labeling， data collection in real-world settings， and the use of specialized hardware for capturing high-quality images and videos. As deep learning methods increasingly rely on large-scale labeled data， the need for innovative data generation techniques has become more prominent. In recent years， generative models such as Generative Adversarial Network （GAN） and diffusion models have emerged as powerful tools for generating synthetic datasets. These models can create diverse， controllable， and highly realistic image and video data， offering an effective alternative or supplement to traditional data collection methods. By using these techniques， vast amounts of data can be generated to represent various scenarios and conditions， which are essential for training robust computer vision models. Unlike traditional data collection， which is often constrained by geographic， financial， and logistical limitations， generative models provide a flexible solution that can generate data without the need for real-world data acquisition. This review begins by introducing the significance and background of image and video data generation in computer vision. Image and video data play a critical role in the development and training of computer vision algorithms， as large-scale， diverse datasets are essential for building robust models. Moving on， the review categorizes the key data generation techniques into three broad approaches： traditional data augmentation methods， 3D rendering-based generation methods， and deep generative models. First， traditional data augmentation techniques， including geometric transformations， color adjustments， and cropping， are commonly used to expand existing datasets and improve model generalization. These methods are relatively simple and computationally inexpensive， but their ability to generate diverse and realistic datasets is limited. In contrast， 3D rendering technologies， such as virtual engines and neural radiance fields （NeRF）， enable the creation of highly realistic synthetic data by simulating real-world environments. These technologies offer the advantage of generating diverse datasets by adjusting environmental factors such as lighting， camera angles， and object interactions. Furthermore， deep generative models， such as GAN and diffusion models， have shown remarkable effectiveness in generating high-quality synthetic data. GAN work by training two neural networks in a competitive manner： a generator creates synthetic data， while a discriminator evaluates its realism. Over time， the generator improves its output， creating increasingly realistic data. Diffusion models， on the other hand， iteratively refine noisy data into clear and realistic images or videos， enabling the generation of diverse， high-quality datasets. Next， the review discusses the diverse applications of these generative models across a wide range of computer vision tasks. These include image enhancement， object detection， tracking， pose or action recognition， biometric identification， crowd behavior analysis， and more recently， emerging fields like autonomous driving and embodied artificial intelligence. In particular， synthetic data has been instrumental in training models for tasks that are challenging to address with real-world data alone. For example， in biometric identification， synthetic data can generate a wide variety of samples for fingerprints， faces， irises， and palmprints， providing more diverse training examples and reducing reliance on real biometric data， which is often difficult to acquire. Similarly， in autonomous driving， synthetic data can generate various driving scenarios， including different road conditions， weather patterns， and traffic behaviors， helping to train autonomous vehicle models in a safe and controlled environment. Additionally， synthetic data has proven invaluable in fields like pose and action recognition， where diverse datasets are essential for accurately detecting human actions across different settings and contexts. However， despite the considerable progress made in image and video data generation， several challenges remain. One of the primary issues is ensuring the realism and diversity of generated data， which is crucial for training models that can generalize well to real-world scenarios. Furthermore， despite significant advances in generative models， there is still a lack of research on how to effectively evaluate the quality of synthetic data and use feedback mechanisms to guide the generation process. In addition， ethical considerations surrounding the use of synthetic data， especially in sensitive applications like biometric recognition， must be carefully addressed. The use of synthetic data raises concerns regarding privacy， consent， and potential misuse， which must be handled responsibly. Looking ahead， as generative models continue to evolve， it is expected that they will produce even more realistic and diverse datasets， offering new possibilities for training computer vision models. The future of image and video data generation holds great promise， with advancements in generative technologies poised to drive further innovation in computer vision， artificial intelligence， and many other fields.

关键词

Keywords

references

Abdal R ， Qin Y and Wonka P . 2019 . Image2StyleGAN： How to embed images into the StyleGAN latent space？ // Proceedings of IEEE International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 4431 – 4440 ［ DOI： 10.1109/ICCV.2019.00453 http://dx.doi.org/10.1109/ICCV.2019.00453 ］

Abdal ， Rameen and Zhu ， Peihao and Mitra ， Niloy J . and Wonka， Peter . 2021 . StyleFlow： Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. ACM Transactions on Graphics ， 40 （ 3 ）： 1 – 21 ［ DOI： 10.1145/3447648 http://dx.doi.org/10.1145/3447648 ］

Ainam J P ， Qin K ， Liu G S and Luo G C . 2019 . Sparse label smoothing regularization for person re-identification . IEEE Access 7 ： 27899 – 27910 ［ DOI： 10.1109/ACCESS.2019.2901599 http://dx.doi.org/10.1109/ACCESS.2019.2901599 ］

Akada H ， Wang J ， Shimada S ， Takahashi M ， Theobalt C and Golyanik V . 2022 . UnrealEgo： a new dataset for robust egocentric 3D human motion capture // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv， Israel ： Springer-Verlag： 1 – 17 ［ DOI： 10.1007/978-3-031-20068-7_1 http://dx.doi.org/10.1007/978-3-031-20068-7_1 ］

Alinezhad Noghre G ， Danesh Pazho A ， Sanchez J ， Hewitt N ， Neff C and Tabkhi H . 2022 . ADG-Pose： automated dataset generation for&nbsp；real-world human pose estimation // Proceedings of the Pattern Recognition and Artificial Intelligence： Third International Conference . Paris， France ： Springer-Verlag： 258 – 270 ［ DOI： 10.1007/978-3-031-09282-4_22 http://dx.doi.org/10.1007/978-3-031-09282-4_22 ］

Alomar K ， Aysel H I and Cai X . 2023 . Data Augmentation in Classification and Segmentation： A Survey and New Strategies . Journal of Imaging ， 9 （ 2 ）： 46 ［ DOI： 10.3390/jimaging9020046 http://dx.doi.org/10.3390/jimaging9020046 ］

An J ， Zhang S Y ， Yang H ， Gupta S ， Huang J B ， Luo J B and Yin X . 2023a . Latent-shift： latent diffusion with temporal shift for efficient text-to-video generation ［EB/OL］. ［ 2023-04-18 ］. https://arxiv.org/abs/2304.08477.pdf https://arxiv.org/abs/2304.08477.pdf

An S Z ， Xu H Y ， Shi Y C ， Song G X ， Ogras U Y and Luo L J . 2023b . PanoHead： geometry-aware 3D full-head synthesis in 360°// Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE ： 20950 – 20959 ［ DOI： 10.1109/CVPR52729.2023.02007 http://dx.doi.org/10.1109/CVPR52729.2023.02007 ］

Andrychowicz M ， Baker B ， Chociej M ， Józefowicz R ， McGrew B ， Pachocki J ， Petron A ， Plappert M ， Powell G ， Ray A ， Schneider J ， Sidor S ， Tobin J ， Welinder P ， Weng L and Zaremba W . 2020 . Learning dexterous in-hand manipulation . International Journal of Robotics Research ， 39 （ 1 ）： 3 – 20 ［ DOI： 10.1177/0278364919887447 http://dx.doi.org/10.1177/0278364919887447 ］

Aranjuelo N ， García S ， Loyo E ， Unzueta L and Otaegui O . 2021 . Key strategies for synthetic data generation for training intelligent systems based on people detection from omnidirectional cameras . Computers & Electrical Engineering ， 92 ： 107105 ［ DOI： 10.1016/j.compeleceng.2021.107105 http://dx.doi.org/10.1016/j.compeleceng.2021.107105 ］

Arjovsky M ， Chintala S and Bottou L . 2017 . Wasserstein GAN ［EB/OL］. ［ 2017-12-06 ］. https://arxiv.org/abs/1701.07875.pdf https://arxiv.org/abs/1701.07875.pdf

Attia M ， Attia M H ， Iskander J ， Saleh K ， Nahavandi D ， Abobakr A ， Hossny M and Nahavandi S . 2019 . Fingerprint synthesis via latent space representation // Proceedings of the 2019 IEEE International Conference on Systems， Man and Cybernetics . Bari， Italy ： IEEE： 1855 – 1861 ［ DOI： 10.1109/SMC.2019.8914499 http://dx.doi.org/10.1109/SMC.2019.8914499 ］

Azizi S ， Kornblith S ， Saharia C ， Norouzi M and Fleet DJ . 2023 . Synthetic data from diffusion models improves imagenet classification ［EB/OL］. ［ 2023-4-17 ］. https://arxiv.org/abs/2304.08466.pdf https://arxiv.org/abs/2304.08466.pdf

Badler N I ， Phillips C B and Webber B L . 1993 . Simulating Humans： Computer Graphics， Animation， and Control. New York， USA： Oxford University Press .

Bahmani K ， Plesh R ， Johnson P ， Schuckers S and Swyka T . 2021 . High fidelity fingerprint generation ： quality， uniqueness， and privacy// Proceedings of the 2021 IEEE International Conference on Image Processing . Anchorage， AK， USA ： IEEE： 3018 – 3022 ［ DOI： 10.1109/ICIP42928.2021.9506386 http://dx.doi.org/10.1109/ICIP42928.2021.9506386 ］

Bai Q Y ， Xia W H ， Yin F and Yang Y J . 2022 . Identity-guided face generation with multi-modal contour conditions // Proceedings of the 2022 IEEE International Conference on Image Processing . Bordeaux， France ： IEEE： 1881 – 1885 ［ DOI： 10.1109/ICIP46576.2022.9897459 http://dx.doi.org/10.1109/ICIP46576.2022.9897459 ］

Bąk S ， Carr P and Lalonde J F . 2018 . Domain adaptation through synthesis for unsupervised person re-identification // Proceedings of the 15th European Conference Computer Vision . Munich， Germany ： Springer International Publishing： 193 – 209 ［ DOI： 10.1007/978-3-030-01261-8_12 http://dx.doi.org/10.1007/978-3-030-01261-8_12 ］

Bao F ， Xiang C D ， Yue G ， He G D ， Zhu H Z ， Zheng K W ， Zhao M ， Liu S L ， Wang Y L and Zhu J . 2024 . Vidu： a highly consistent， dynamic and skilled text-to-video generator with diffusion models ［EB/OL］. ［ 2024-05-07 ］. https://arxiv.org/abs/2405.04233.pdf https://arxiv.org/abs/2405.04233.pdf

Bao J ， Chen D ， Wen F ， Li H and Hua G . 2018 . Towards open-set identity preserving face synthesis // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 6713 – 6722 ［ DOI： 10.1109/CVPR.2018.00702 http://dx.doi.org/10.1109/CVPR.2018.00702 ］

Bao J ， Chen D ， Wen F ， Li H Q and Hua G . 2017 . CVAE-GAN： fine-grained image generation through asymmetric training // Proceedings of the 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 2764 – 2773 ［ DOI： 10.1109/ICCV.2017.299 http://dx.doi.org/10.1109/ICCV.2017.299 ］

Barbosa I B ， Cristani M ， Caputo B ， Rognhaugen A and Theoharis T . 2018 . Looking beyond appearances： synthetic training data for deep CNNs in re-identification . Computer Vision and Image Understanding 167 ： 50 – 62 ［ DOI： 10.1016/j.cviu.2017.12.002 http://dx.doi.org/10.1016/j.cviu.2017.12.002 ］

Barron J T ， Mildenhall B ， Verbin D ， Srinivasan P P and Hedman P . 2022 . Mip-NeRF 360 ： nnbounded anti-aliased neural radiance fields // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans， LA， USA ： IEEE： 5460 – 5469 ［ DOI： 10.1109/CVPR52688.2022.00539 http://dx.doi.org/10.1109/CVPR52688.2022.00539 ］

Bar-Tal O ， Chefer H ， Tov O ， Herrmann C ， Paiss R ， Zada S ， Ephrat A ， Hur J ， Liu G Hui ， Raj A ， Li Y Z ， Rubinstein M ， Michaeli T ， Wang O ， Sun D ， Dekel T and Mosseri I . 2024 . Lumiere： a space-time diffusion model for video generation // Proceedings of the SIGGRAPH Asia 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery： 1 – 11 ［ DOI： 10.1145/3680528.3687614 http://dx.doi.org/10.1145/3680528.3687614 ］

Bazavan E G ， Zanfir A ， Zanfir M ， Freeman W T ， Sukthankar R and Sminchisescu C . 2022 . HSPACE： Synthetic parametric humans animated in complex environments ［EB/OL］. ［ 2022-01-06 ］. https://arxiv.org/abs/2112.12867.pdf https://arxiv.org/abs/2112.12867.pdf

Beattie C ， Leibo J Z ， Teplyashin D ， Ward T ， Wainwright M ， Küttler H ， Lefrancq A ， Green S ， Valdés V ， Sadik A ， Schrittwieser J ， Anderson K ， York S ， Cant M ， Cain A ， Bolton A ， Gaffney S ， King H ， Hassabis D ， Legg S and Petersen S . 2016 . DeepMind lab ［EB/OL］. ［ 2016-12-12 ］. https://arxiv.org/abs/1612.03801.pdf https://arxiv.org/abs/1612.03801.pdf

Bell-Kligler S ， Shocher A and Irani M . 2019 . Blind super-resolution kernel estimation using an internal-GAN // Proceedings of the 33rd International Conference on Neural Information Processing Systems . Vancouver， Canada ： Curran Associates， Inc： 284 – 293 ［ DOI： 10.5555/3454287.3454313 http://dx.doi.org/10.5555/3454287.3454313 ］

Bergman A ， Kellnhofer P and Wang Y . 2022 . Generative neural articulated radiance fields // Proceedings of the Advances in Neural Information Processing Systems . New Orleans， Louisiana， United States ： Curran Associates， Inc： 19900 – 19916 ［ DOI： 10.48550/arXiv.2206.14314 http://dx.doi.org/10.48550/arXiv.2206.14314 ］

Bézenac E de ， Rangapuram SS ， Benidis K ， Bohlke-Schneider M ， and others . 2020 . Normalizing Kalman Filters for Multivariate Time Series Analysis // Proceedings of the Advances in Neural Information Processing Systems . Virtually ： Curran Associates， Inc： 2995 – 3007 ［ DOI： 10.5555/3495724.3495976 http://dx.doi.org/10.5555/3495724.3495976 ］

Black M J ， Patle P ， Tesch J and Yang J L . 2023 . Bedlam： a synthetic dataset of bodies exhibiting detailed lifelike animated motion // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 8726 – 8737 ［ DOI： 10.1109/CVPR52729.2023.00843 http://dx.doi.org/10.1109/CVPR52729.2023.00843 ］

Blattmann A ， Dockhorn T ， Kulal S ， Mendelevitch D ， Kilian M ， Lorenz D ， Levi Y ， English Z ， Voleti V ， Letts A ， Jampani V and Rombach R . 2023a . Stable video diffusion： scaling latent video diffusion models to large datasets ［EB/OL］. ［ 2023-11-25 ］. https://arxiv.org/abs/2311.15127.pdf https://arxiv.org/abs/2311.15127.pdf

Blattmann A ， Rombach R ， Ling H ， Dockhorn T ， Kim S W ， Fidler S and Kreis K . 2023b . Align your latents： high-resolution video synthesis with latent diffusion models // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 22563 – 22575 ［ DOI： 10.1109/CVPR52729.2023.02161 http://dx.doi.org/10.1109/CVPR52729.2023.02161 ］

Bontrager P ， Roy A ， Togelius J ， Memon N and Ross A . 2018 . DeepMasterPrints： generating MasterPrints for dictionary attacks via latent variable evolution // Proceedings of the 2018 IEEE 9th International Conference on Biometrics Theory， Applications and Systems . Redondo Beach， CA， USA ： IEEE： 1 – 9 ［ DOI： 10.1109/BTAS.2018.8698539 http://dx.doi.org/10.1109/BTAS.2018.8698539 ］

Borgia A ， Hua Y ， Kodirov E and Robertson N . 2019 . GAN-based pose-aware regulation for video-based person re-identification // Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision （WACV） . Waikoloa Village， HI， USA ： IEEE： 1175 – 1184 ［ DOI： 10.1109/WACV.2019.00130 http://dx.doi.org/10.1109/WACV.2019.00130 ］

Brock A ， Donahue J and Simonyan K . 2019 . Large scale GAN training for high fidelity natural image synthesis // Proceedings of the International Conference on Learning Representations . New Orleans， Louisiana， USA . ［ DOI： 10.48550/arXiv.1809.11096 http://dx.doi.org/10.48550/arXiv.1809.11096 ］

Brock A . 2018 . Large scale GAN training for high fidelity natural image synthesis ［EB/OL］. ［ 2018-09-28 ］. https://arxiv.org/abs/1809.11096.pdf https://arxiv.org/abs/1809.11096.pdf

Brooks T an d Efros A A . 2022 . Hallucinating pose-compatible scenes // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer Nature Switzerland： 510 – 528 ［ DOI： 10.1007/978-3-031-19787-1_29 http://dx.doi.org/10.1007/978-3-031-19787-1_29 ］

Bytedance . 2024 . Jimeng

Cabon Y ， Murray N and Humenberger M . 2020 . Virtual kitti 2 ［EB/OL］. ［ 2020-01-29 ］. https://arxiv.org/abs/2001.10773.pdf https://arxiv.org/abs/2001.10773.pdf

Caesar H ， Bankiti V ， Lang A H ， Vora S ， Liong V E ， Xu Q ， Krishnan A ， Pan Y ， Baldan G and Beijbom O . 2020 . nuScenes： a multimodal dataset for autonomous driving // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 11618 – 11628 ［ DOI： 10.1109/CVPR42600.2020.01164 http://dx.doi.org/10.1109/CVPR42600.2020.01164 ］

Cai J R ， Gu S H and Zhang L . 2018 . Learning a deep single image contrast enhancer from multi-exposure images . IEEE Transactions on Image Processing ， 27 （ 4 ）： 2049 – 2062 ［ DOI： 10.1109/TIP.2018.2794218 http://dx.doi.org/10.1109/TIP.2018.2794218 ］

Cai Y H ， Bian H ， Lin J ， Wang H Q ， Timofte R and Zhang Y L . 2023 . Retinexformer： One-stage Retinex based transformer for low-light image enhancement // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 12470 – 12479 ［ DOI： 10.1109/ICCV51070.2023.01149 http://dx.doi.org/10.1109/ICCV51070.2023.01149 ］

Cai Z Y ， Zhang M Y ， Ren J ， Wei C J J ， Ren D L ， Lin Z H ， Zhao H ， Yang L ， Loy C C and Liu Z W . 2021 . Playing for 3D human recovery ［EB/OL］ . ［ 2021-10-14 ］. https://arxiv.org/pdf/2110.07588.pdf https://arxiv.org/pdf/2110.07588.pdf

Cao K and Jain A K . 2018 . Fingerprint synthesis： evaluating fingerprint search at scale // Proceedings of the 2018 International Conference on Biometrics . Gold Coast， QLD， Australia ： IEEE： 31 – 38 ［ DOI： 10.1109/ICB2018.2018.00016 http://dx.doi.org/10.1109/ICB2018.2018.00016 ］

Cao Y ， Cao Y P ， Han K ， Shan Y F and Wong K Y K . 2024 . DreamAvatar： Text-and-shape guided 3D human avatar generation via diffusion models // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle， Washington， USA ： IEEE： 958 – 968 ［ DOI： 10.1109/CVPR52733.2024.00097 http://dx.doi.org/10.1109/CVPR52733.2024.00097 ］

Cappelli R ， Erol A ， Maio D and Maltoni D . 2000 . Synthetic fingerprint-image generation // Proceedings of the 15th International Conference on Pattern Recognition . Barcelona， Spain ： IEEE： 471 – 474 ［ DOI： 10.1109/ICPR.2000.903586 http://dx.doi.org/10.1109/ICPR.2000.903586 ］

Cappelli R ， Maio D and Maltoni D . 2001 . Modelling plastic distortion in fingerprint images // Proceedings of ICAPR 2001 . Rio De Janeiro ： Springer： 371 – 378 ［ DOI： 10.1007/3-540-44732-6_38 http://dx.doi.org/10.1007/3-540-44732-6_38 ］

Cappelli R ， Maio D and Maltoni D . 2002 . Synthetic fingerprint-database generation // Proceedings of the International Conference on Pattern Recognition . Quebec City， QC， Canada ： IEEE： 744 – 747 ［ DOI： 10.1109/ICPR.2002.1048096 http://dx.doi.org/10.1109/ICPR.2002.1048096 ］

Cappelli R ， Maio D and Maltoni D . 2004 . An improved noise model for the generation of synthetic fingerprints // Proceedings of the ICARCV 2004 8th Control， Automation， Robotics and Vision Conference . Kunming， China ： IEEE： 1250 – 1255 ［ DOI： 10.1109/ICARCV.2004.1469025 http://dx.doi.org/10.1109/ICARCV.2004.1469025 ］

Chan C ， Ginosar S ， Zhou T and Efros AA . 2019 . Everybody dance now // Proceedings of the IEEE International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 5932 – 5941 ［ DOI： 10.1109/ICCV.2019.00603 http://dx.doi.org/10.1109/ICCV.2019.00603 ］

Chan E R ， Monteiro M ， Kellnhofer P ， Wu J J and Wetzstein G . 2021a . Pi-gan： periodic implicit generative adversarial networks for 3d-aware image synthesis // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 5799 – 5809 ［ DOI： 10.1109/CVPR46437.2021.00574 http://dx.doi.org/10.1109/CVPR46437.2021.00574 ］

Chan K C ， Wang X T ， Xu X Y ， Gu J W and Loy C C . 2021b . GLEAN： Generative latent bank for large-factor image super-resolution // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 14240 – 14249 ［ DOI： 10.1109/CVPR46437.2021.01402 http://dx.doi.org/10.1109/CVPR46437.2021.01402 ］

Chang A X ， Dai A ， Funkhouser T A ， Halber M ， Nießner M ， Savva M ， Song S ， Zeng A and Zhang Y . 2017 . Matterport3 D： learning from RGB-D data in indoor environments// Proceedings of the International Conference on 3D Vision . Qingdao， China ： IEEE： 667 – 676 ［ DOI： 10.1109/3DV.2017.00081 http://dx.doi.org/10.1109/3DV.2017.00081 ］

Chao W T ， Chang L ， Wang X G ， Cheng J ， Deng X M and Duan F Q . 2019 . High-fidelity face sketch-to-photo synthesis using generative adversarial network // Proceedings of the 2019 IEEE International Conference on Image Processing . Taipei， Taiwan ： IEEE： 4699 – 4703 ［ DOI： 10.1109/ICIP.2019.8803549 http://dx.doi.org/10.1109/ICIP.2019.8803549 ］

Charatan D ， Lester Li S ， Tagliasacchi A and Sitzmann V . 2024 . Pixelsplat： 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 19457 – 19467 ［ DOI： 10.1109/CVPR52733.2024.01840 http://dx.doi.org/10.1109/CVPR52733.2024.01840 ］

Chen G L ， Zhao H Y ， Pang C K ， Li T L and Pang C Y . 2019 . Image Scaling： How Hard Can it Be？ IEEE Access ， 7 ： 129452 – 129465 ［ DOI： 10.1109/ACCESS.2019.2940353 http://dx.doi.org/10.1109/ACCESS.2019.2940353 ］

Chen H X ， Xia M H ， He Y Q ， Zhang Y ， Cun X D ， Yang S S ， Xing J B ， Liu Y F ， Chen Q F ， Wang X T ， Weng C and Shan Y . 2023a . VideoCrafter1： open diffusion models for high-quality video generation ［EB/OL］. ［ 2023-10-30 ］. https://arxiv.org/abs/2310.19512.pdf https://arxiv.org/abs/2310.19512.pdf

Chen H Z ， Pengxin X and Longsheng Z . 2021 . A deep convolutional generative adversarial network-based fake fingerprint generation method // Proceedings of the 2021 IEEE International Conference on Computer Science， Electronic Information Engineering and Intelligent Control Technology . Fuzhou， China ： IEEE： 63 – 67 ［ DOI： 10.1109/CEI52496.2021.9574508 http://dx.doi.org/10.1109/CEI52496.2021.9574508 ］

Chen H ， Gu J T ， Chen A P ， Tian W ， Tu Z W ， Liu L J and Su H . 2023b . Single-stage diffusion nerf： a unified approach to 3d generation and reconstruction // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 2416 – 2425 ［ DOI： 10.1109/ICCV51070.2023.00229 http://dx.doi.org/10.1109/ICCV51070.2023.00229 ］

Chen J ， Ye W C ， Wang Y F ， Chen D P ， Huang D ， Ouyang W L ， Zhang G F ， Qiao Y and He T . 2024a . GigaGS： scaling up planar-based 3D gaussians for large scene surface reconstruction ［EB/OL］. ［ 2024-09-10 ］. https://arxiv.org/abs/2409.06685.pdf https://arxiv.org/abs/2409.06685.pdf

Chen K ， Chen W H ， He T ， Du R ， Wang F ， Sun X Y ， Guo Y C and Ding G G . 2022 . TAGPerson： a target-aware generation pipeline for person re-identification // Proceedings of the 30th ACM International Conference on Multimedia . Lisboa Portugal ： ACM： 560 – 571 ［ DOI： 10.1145/3503161.3548013 http://dx.doi.org/10.1145/3503161.3548013 ］

Chen K ， Xie E ， Chen Z J ， Wang Y ， Hong L W ， Li Z M and Yeung D Y . 2024b . Geodiffusion： text-prompted geometric control for object detection data generation // Proceedings of the International Conference on Learning Representations . Vienna， Austria . ［ DOI： 10.48550/arXiv.2306.04607 http://dx.doi.org/10.48550/arXiv.2306.04607 ］

Chen R ， Chen Y W ， Jiao N X and Jia K . 2023c . Fantasia3d： disentangling geometry and appearance for high-quality text-to-3d content creation // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 22246 – 22256 ［ DOI： 10.1109/ICCV51070.2023.02033 http://dx.doi.org/10.1109/ICCV51070.2023.02033 ］

Chen W and Hays J . 2018 . SketchyGAN： towards diverse and realistic sketch to image synthesis // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 9416 – 9425 ［ DOI： 10.1109/CVPR.2018.00981 http://dx.doi.org/10.1109/CVPR.2018.00981 ］

Chen X Y ， Wang Y H ， Zhang L J ， Zhuang S B ， Ma X ， Yu J ， Wang Y L ， Lin D H ， Qiao Y and Liu Z W . 2023d . Seine： short-to-long video diffusion model for generative transition and prediction // Proceedings of the The Twelfth International Conference on Learning Representations . Vienna， Austria . ［ DOI： 10.48550/arXiv.2310.20700 http://dx.doi.org/10.48550/arXiv.2310.20700 ］

Chen X ， Duan Y ， Houthooft R ， Schulman J ， Sutskever I and Abbeel P . 2016 . InfoGAN： interpretable representation learning by information maximizing generative adversarial nets // Proceedings of the 30th International Conference on Neural Information Processing Systems . Red Hook， NY， USA ： Curran Associates Inc： 2180 – 2188 ［ DOI： 10.5555/3157096.3157340 http://dx.doi.org/10.5555/3157096.3157340 ］

Chen Y and Jain A K . 2009 . Beyond minutiae： a fingerprint individuality model with pattern， ridge and pore features // Proceedings of the 2nd International Conference on Biometrics . Alghero， Italy ： Springer： 523 – 533 ［ DOI： 10.1007/978-3-642-01793-3_54 http://dx.doi.org/10.1007/978-3-642-01793-3_54 ］

Chen Y D ， Xu H F ， Zheng C X ， Zhuang B H ， Pollefeys M ， Geiger A ， Cham T J and Cai J F . 2024c . MVSplat： efficient 3D gaussian splatting from sparse multi-view images // Proceedings of the Computer Vision – ECCV 2024： 18th European Conference . Milan， Italy ： Springer-Verlag： 370 – 386 ［ DOI： 10.1007/978-3-031-72664-4_21 http://dx.doi.org/10.1007/978-3-031-72664-4_21 ］

Chen Y T ， Mihajlovic M ， Chen X Y ， Wang Y M ， Prokudin S and Tang S Y . 2024d . SplatFormer： point transformer for robust 3D gaussian splatting ［EB/OL］. ［ 2024-11-10 ］. https://arxiv.org/abs/2411.06390.pdf https://arxiv.org/abs/2411.06390.pdf

Cheng J ， Liang X ， Shi X F ， He T ， Xiao T and Li M . 2023 . LayoutDiffuse： Adapting foundational diffusion models for layout-to-image generation . ［2023-2-16］ . https：//arxiv.org/abs/2302.08908.pdf https://arxiv.org/abs/2302.08908.pdf

Cheung E ， Wong A ， Bera A ， Wang X and Manocha D . 2019 . LCrowdV： generating labeled videos for pedestrian detectors training and crowd behavior learning . Neurocomputing ， 337 ： 1 – 14 ［ DOI： 10.1016/j.neucom.2019.01.078 http://dx.doi.org/10.1016/j.neucom.2019.01.078 ］

Choi Y ， Choi M ， Kim M ， Ha J W ， Kim S and Choo J . 2018 . StarGAN： unified generative adversarial networks for multi-domain image-to-image translation // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 8789 – 8797 ［ DOI： 10.1109/CVPR.2018.00916 http://dx.doi.org/10.1109/CVPR.2018.00916 ］

Chung H ， Kim J ， McCann M T ， Klasky M L and Ye J C . 2023 . Diffusion Posterior Sampling for General Noisy Inverse Problems // Proceedings of the International Conference on Learning Representations . Réunion Island， France . ［ DOI： 10.48550/arXiv.2209.14687 http://dx.doi.org/10.48550/arXiv.2209.14687 ］

Ciampi L ， Messina N ， Falchi F ， Gennaro C and Amato G . 2020 . Virtual to real adaptation of pedestrian detectors . Sensors ， 20 （ 18 ）： 5250 ［ DOI： 10.3390/s20185250 http://dx.doi.org/10.3390/s20185250 ］

Clark A ， Donahue J and Simonyan K . 2019 . Adversarial video generation on complex datasets . ［2019-7-15］ . https：//arxiv.org/pdf/1907.06571.pdf https://arxiv.org/pdf/1907.06571.pdf

Courty N ， Allain P ， Creusot C and Corpetti T . 2014 . Using the agoraset dataset： assessing for the quality of crowd video analysis methods . Pattern Recognition Letters ， 44 ： 161 – 170 ［ DOI： 10.1016/j.patrec.2013.12.005 http://dx.doi.org/10.1016/j.patrec.2013.12.005 ］

Crisan S ， Târnovan I G and Crisan T E . 2008 . A hand vein structure simulation platform for algorithm testing and biometric identification // Proceedings of the 16th IMEKO TC4 Symposium . Florence， Italy ：：［DOI：］

Cui J ， Wang Y ， Huang J ， Tan T and Sun Z . 2004 . An iris image synthesis method based on PCA and super-resolution // Proceedings of the 17th International Conference on Pattern Recognition ， 2004. ICPR 2004. Cambridge， UK ： IEEE： 471 – 474 ［ DOI： 10.1109/ICPR.2004.1333804 http://dx.doi.org/10.1109/ICPR.2004.1333804 ］

Curtò J de ， Zarza I C ， Torre F D L ， King I and Lyu M R . 2017 . High-resolution deep convolutional generative adversarial networks ［EB/OL］. ［ 2017-11-17 ］. http://arxiv.org/abs/1711.06491.pdf http://arxiv.org/abs/1711.06491.pdf

Dai P X ， Xu J M ， Xie W X ， Liu X G ， Wang H M and Xu W W . 2024 . High-quality surface reconstruction using gaussian surfels // Proceedings of the ACM SIGGRAPH 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery： 1 – 11 ［ DOI： 10.1145/3641519.3657441 http://dx.doi.org/10.1145/3641519.3657441 ］

Dai P Y ， Ji R R ， Wang H B ， Wu Q and Huang Y Y . 2018 . Cross-modality person re-identification with generative adversarial training // Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence . Stockholm， Sweden ： International Joint Conferences on Artificial Intelligence Organization： 677 – 683 ［ DOI： 10.24963/ijcai.2018/94 http://dx.doi.org/10.24963/ijcai.2018/94 ］

De Souza C R ， Gaidon A ， Cabon Y and López Peña A M . 2017 . Procedural generation of videos to train deep action recognition networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 2594 – 2604 ［ DOI： 10.1109/CVPR.2017.278 http://dx.doi.org/10.1109/CVPR.2017.278 ］

Deitke M ， Schwenk D ， Salvador J ， Weihs L ， Michel O ， VanderBilt E ， Schmidt L ， Ehsani K ， Kembhavi A and Farhadi A . 2023 . Objaverse： a universe of annotated 3D objects // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 13142 – 13153 ［ DOI： 10.1109/CVPR52729.2023.01263 http://dx.doi.org/10.1109/CVPR52729.2023.01263 ］

Deng C Y ， Jiang C Y ， Qi C R ， Yan X C ， Zhou Y ， Guibas L and Anguelov D . 2023 . Nerdi： single-view nerf synthesis with language-guided diffusion as general image priors // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 20637 – 20647 ［ DOI： 10.1109/CVPR52729.2023.01977 http://dx.doi.org/10.1109/CVPR52729.2023.01977 ］

Deng Y ， Yang J L ， Chen D ， Wen F and Tong X . 2020 . Disentangled and controllable face image generation via 3D imitative-contrastive learning // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 5153 – 5162 ［ DOI： 10.1109/CVPR42600.2020.00520 http://dx.doi.org/10.1109/CVPR42600.2020.00520 ］

Deng Y ， Yang J L ， Xiang J F and Tong X . 2022 . Gram： generative radiance manifolds for 3d-aware image generation // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 10673 – 10683 ［ DOI： 10.1109/CVPR52688.2022.01041 http://dx.doi.org/10.1109/CVPR52688.2022.01041 ］

Denton E ， Chintala S ， Szlam A ， and Fergus R . 2015 . Deep generative image models using a laplacian pyramid of adversarial networks // Proceedings of 29th Annual Conference on Neural Information Processing Systems . Montreal， Canada ： Curran Associates， Inc： 1486 - 1494 ［ DOI： 10.5555/2969239.2969405 http://dx.doi.org/10.5555/2969239.2969405 ］

DeVries T ， Bautista M A ， Srivastava N ， Taylor G W and Susskind J M . 2021 . Unconstrained scene generation with locally conditioned radiance fields // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision . Montreal， QC， Canada ： IEEE： 14304 – 14313 ［ DOI： 10.1109/ICCV48922.2021.01404 http://dx.doi.org/10.1109/ICCV48922.2021.01404 ］

Dhariwal P and Nichol A . 2021 . Diffusion Models Beat GANs on Image Synthesis // Proceedings of the Advances in Neural Information Processing Systems . Virtual ： Curran Associates， Inc： 8780 – 8794 ［ DOI： 10.5555/3540261.3540933 http://dx.doi.org/10.5555/3540261.3540933 ］

Di Benedetto M ， Carrara F ， Meloni E ， Amato G ， Falchi F and Gennaro C . 2021 . Learning accurate personal protective equipment detection from virtual worlds . Multimedia Tools and Applications ， 80 （ 15 ）： 23241 – 23253 ［ DOI： 10.1007/s11042-021-10698-9 http://dx.doi.org/10.1007/s11042-021-10698-9 ］

Diefenderfer G T . 2006 . Fingerprint recognition . Monterey California ： Naval Postgraduate School .

Dinh L ， Krueger D and Bengio Y . 2015 . NICE： Non-linear Independent Components Estimation // Proceedings of the International Conference on Learning Representations . San Diego， California， USA . ［ DOI： 10.48550/arXiv.1410.8516 http://dx.doi.org/10.48550/arXiv.1410.8516 ］

Dinh L ， Sohl-Dickstein J and Bengio S . 2017 . Density estimation using Real NVP // Proceedings of the International Conference on Learning Representations . Toulouse， France . ［ DOI： 10.48550/arXiv.1605.08803 http://dx.doi.org/10.48550/arXiv.1605.08803 ］

Donahue C ， Balsubramani A ， McAuley J J and Lipton Z C . 2017 . Semantically decomposing the latent spaces of generative adversarial networks ［EB/OL］. ［ 2017-05-22 ］. http://arxiv.org/abs/1705.07904.pdf http://arxiv.org/abs/1705.07904.pdf

Dong C ， Loy C C ， He K and Tang X . 2014 . Learning a deep convolutional network for image super-resolution // Proceedings of the European Conference on Computer Vision . Zurich， Switzerland ： Springer： 184 – 199 ［ DOI： 10.1007/978-3-319-10593-2_13 http://dx.doi.org/10.1007/978-3-319-10593-2_13 ］

Dong Z ， Chen X ， Yang J ， Black M J ， Hilliges O and Geiger A . 2023 . AG 3 D ： Learning to generate 3D avatars from 2D image collections // Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris， France ： IEEE： 14916 – 14927 ［ DOI： 10.1109/ICCV51070.2023.01370 http://dx.doi.org/10.1109/ICCV51070.2023.01370 ］

Dosovitskiy A ， Beyer L ， Kolesnikov A ， Weissenborn D ， Zhai X H ， Unterthiner T ， Dehghani M ， Minderer M ， Heigold G ， Gelly S ， Uszkoreit J and Houlsby N . 2021 . An image is worth 16 x 16 words： transformers for image recognition at scale［EB/OL］. ［ 2021-07-03 ］. https://arxiv.org/abs/2010.11929.pdf https://arxiv.org/abs/2010.11929.pdf

Dosovitskiy A ， Ros G ， Codevilla F ， López A M and Koltun V . 2017 . CARLA： an open urban driving simulator ［EB/OL］. ［ 2017-11-10 ］. http://arxiv.org/abs/1711.03938.pdf http://arxiv.org/abs/1711.03938.pdf

Du X Z ， Zoph B ， Hung W C and Lin T Y . 2021 . Simple training strategies and model scaling for object detection ［EB/OL］. ［ 2021-07-30 ］. https://arxiv.org/pdf/2107.00057.pdf https://arxiv.org/pdf/2107.00057.pdf

Duan Y X ， Wei F Y ， Dai Q Y ， He Y H ， Chen W Z and Chen B Q . 2024 . 4D-rotor gaussian splatting： towards efficient novel view synthesis for dynamic scenes // Proceedings of the ACM SIGGRAPH 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery： 1 – 11 ［ DOI： 10.1145/3641519.3657463 http://dx.doi.org/10.1145/3641519.3657463 ］

Dudhane A ， Zamir S W ， Khan S ， Khan F S and Yang M H . 2023 . Burstormer： Burst image restoration and enhancement transformer // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 5703 – 5712 ［ DOI： 10.1109/CVPR52729.2023.00552 http://dx.doi.org/10.1109/CVPR52729.2023.00552 ］

Dunlap L ， Umino A ， Zhang H ， Yang J ， Gonzalez J E and Darrell T . 2024 . Diversify your vision datasets with automatic diffusion-based augmentation // Proceedings of the Advances in Neural Information Processing Systems . Vancouver， Canada ： Curran Associates， Inc： 79024 – 79034 ［ DOI： 10.48550/arXiv.2305.16289 http://dx.doi.org/10.48550/arXiv.2305.16289 ］

Durvasula S ， Zhao A ， Chen F ， Liang R F ， Sanjaya P K and Vijaykumar N . 2023 . DISTWAR： fast differentiable rendering on raster-based rendering pipelines ［EB/OL］. ［ 2023-12-01 ］. https://arxiv.org/abs/2401.05345.pdf https://arxiv.org/abs/2401.05345.pdf

Dvornik N ， Mairal J and Schmid C . 2018 . Modeling visual context is key to augmenting object detection datasets // Proceedings of the European Conference on Computer Vision . Munich， Germany ： Springer： 364 – 380 ［ DOI： 10.1007/978-3-030-01258-8_23 http://dx.doi.org/10.1007/978-3-030-01258-8_23 ］

Dwibedi D ， Misra I and Hebert M . 2017 . Cut， Paste and Learn： Surprisingly Easy Synthesis for Instance Detection // Proceedings of the 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 1310 - 1319 ［ DOI： 10.1109/ICCV.2017.146 http://dx.doi.org/10.1109/ICCV.2017.146 ］

Ekbatani H K ， Pujol O and Seguí S . 2017 . Synthetic data generation for deep learning in counting pedestrians // Proceedings of the 6th International Conference on Pattern Recognition Applications and Methods . Lisbon， Portugal ： ScitePress： 318 – 323 ［ DOI： 10.5220/0006119203180323 http://dx.doi.org/10.5220/0006119203180323 ］

Engelsma J J ， Cao K and Jain A K . 2019 . Learning a fixed length fingerprint representation . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 6 ）： 1981 – 1997 ［ DOI： 10.1109/TPAMI.2019.2961349 http://dx.doi.org/10.1109/TPAMI.2019.2961349 ］

Engelsma J J ， Grosz S A and Jain A K . 2022 . PrintsGAN： synthetic fingerprint generator . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 5 ）： 6111 – 6124 ［ DOI： 10.1109/TPAMI.2022.3204591 http://dx.doi.org/10.1109/TPAMI.2022.3204591 ］

Eom C and Ham B . 2019 . Learning disentangled representation for robust person re-identification // Proceedings of the 33rd International Conference on Neural Information Processing Systems . California， USA ： Curran Associates， Inc： 5297 – 5308 ［ DOI： 10.5555/3454287.3454763 http://dx.doi.org/10.5555/3454287.3454763 ］

Esser P ， Rombach R ， Ommer B ， and Ieee Comp S O C . 2021 . Taming Transformers for High-Resolution Image Synthesis // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 12868 - 12878 ［ DOI： 10.1109/cvpr46437.2021.01268 http://dx.doi.org/10.1109/cvpr46437.2021.01268 ］

Esser P ， Kulal S ， Blattmann A ， Entezari R ， Müller J ， Saini H ， Levi Y ， Lorenz D ， Sauer A ， Boesel F ， Podell D ， Dockhorn T ， English Z ， Lacey K ， Goodwin A ， Marek Y and Rombach R . 2024 . Scaling rectified flow transformers for high-resolution image synthesis ［EB/OL］. ［ 2024-05-05 ］. https://arxiv.org/abs/2403.03206 https://arxiv.org/abs/2403.03206 ，pdf

Fabbri M ， Brasó G ， Maugeri G ， Cetintas O ， Gasparini R ， Osep A ， Calderara S ， Leal-Taixé L and Cucchiara R . 2021 . MOTSynth： how can synthetic data help pedestrian detection and tracking // Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal， QC， Canada ： IEEE： 10829 – 10839 ［ DOI： 10.1109/ICCV48922.2021.01067 http://dx.doi.org/10.1109/ICCV48922.2021.01067 ］

Fahim M and Jung H Y . 2020 . A lightweight GAN network for large scale fingerprint generation . IEEE Access ， 8 ： 92918 – 92928 ［ DOI： 10.1109/ACCESS.2020.2994371 http://dx.doi.org/10.1109/ACCESS.2020.2994371 ］

Fan Z W ， Wang K ， Wen K R ， Zhu Z H ， Xu D J and Wang Z Y . 2023 . Lightgaussian： unbounded 3D gaussian compression with 15x reduction and 200+ FPS ［EB/OL］. ［ 2023-11-28 ］. https://arxiv.org/abs/2311.17245.pdf https://arxiv.org/abs/2311.17245.pdf

Fan L J ， Li T H ， Qin S Y ， Li Y Z ， Sun C ， Rubinstein M ， Sun D Q ， He K M and Tian Y L . 2024 . Fluid： scaling autoregressive text-to-image generative models with continuous tokens ［EB/OL］. ［ 2024-10-17 ］. https://arxiv.org/abs/2410.13863.pdf https://arxiv.org/abs/2410.13863.pdf

Fang H Q ， Han B ， Zhang S ， Zhou S ， Hu C and Ye W M . 2024 . Data augmentation for object detection via controllable diffusion models // Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， HI， USA ： IEEE： 1257 – 1266 ［ DOI： 10.1109/WACV57701.2024.00129 http://dx.doi.org/10.1109/WACV57701.2024.00129 ］

Feng C ， Zhong Y ， Jie Z H ， Xie W J and Ma L . 2024 . Instagen： enhancing object detection by training on synthetic dataset // In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 14121 – 14130 ［ DOI： 10.1109/CVPR52733.2024.01339 http://dx.doi.org/10.1109/CVPR52733.2024.01339 ］

Feng J and Jain A K . 2011 . Fingerprint reconstruction： from minutiae to phase . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 33 （ 2 ）： 209 – 223 ［ DOI： 10.1109/TPAMI.2010.77 http://dx.doi.org/10.1109/TPAMI.2010.77 ］

Feng L ， Li Q Y ， Peng ZH ， Tan SH and Zhou BL . 2023 . TrafficGen： learning to generate diverse and realistic traffic scenarios // Proceedings of the 2023 IEEE International Conference on Robotics and Automation . London， United Kingdom ： IEEE： 3567 – 3575 ［ DOI： 10.1109/ICRA48891.2023.10160296 http://dx.doi.org/10.1109/ICRA48891.2023.10160296 ］

Fogel I S and Sagi D . 2004 . Gabor filters as texture discriminator . Biological Cybernetics ， 61 ： 103 – 113 ［ DOI： 10.1007/BF00204594 http://dx.doi.org/10.1007/BF00204594 ］

Fridovich-Keil S ， Yu A ， Tancik M ， Chen Q ， Recht B and Kanazawa A . 2022 . Plenoxels： Radiance fields without neural networks // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 5501 - 5510 ［ DOI： 10.1109/CVPR52688.2022.00542 http://dx.doi.org/10.1109/CVPR52688.2022.00542 ］

Gaidon A ， Wang Q ， Cabon Y and Vig E . 2016 . VirtualWorlds as proxy for multi-object tracking analysis // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 4340 – 4349 ［ DOI： 10.1109/CVPR.2016.470 http://dx.doi.org/10.1109/CVPR.2016.470 ］

Gan C ， Schwartz J ， Alter S ， Mrowca D ， Schrimpf M ， Traer J ， De Freitas J ， Kubilius J ， Bhandwaldar A ， Haber N ， Sano M ， Kim K ， Wang E ， Lingelbach M ， Curtis A ， Feigelis K T ， Bear D ， Gutfreund D D ， Cox C ， Torralba A ， DiCarlo J J ， Tenenbaum J B ， McDermott J H and Yamins D L . 2021 . ThreeDWorld： a platform for interactive multi-modal physical simulation // Proceedings of the NeurIPS Datasets and Benchmarks Track . Virtual ： Curran Associates， Inc：［ DOI： 10.48550/arXiv.2007.04954 http://dx.doi.org/10.48550/arXiv.2007.04954 ］

Gao J ， Shen T C ， Wang Z ， Chen W Z ， Yin K X ， Li D Q ， Litany O ， Gojcic Z and Fidler S . 2022 . Get 3 d ： a generative model of high quality 3d textured shapes learned from images // Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans， LA， USA ： NIPS： 31841 – 31854 ［ DOI： 10.48550/arXiv.2209.11163 http://dx.doi.org/10.48550/arXiv.2209.11163 ］

Gao S ， Liu X ， Zeng B ， Xu S ， Li Y ， Luo X ， Liu J ， Zhen X and Zhang B . 2023 . Implicit diffusion models for continuous super-resolution // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 10021 – 10030 ［ DOI： 10.1109/CVPR52729.2023.00966 http://dx.doi.org/10.1109/CVPR52729.2023.00966 ］

Gao X ， Gong R ， Shu T ， Xie X ， Wang S and Zhu S C . 2019 . VRKitchen： an interactive 3D virtual environment for task-oriented learning ［EB/OL］. ［ 2019-05-13 ］. https://arxiv.org/abs/1903.05757 https://arxiv.org/abs/1903.05757

Ge Y X ， Li Z W ， Zhao H Y ， Yin G J ， Yi S and Wang X G . 2018 . FD-GAN： pose-guided feature distilling GAN for robust person re-identification // Proceedings of the 32nd International Conference on Neural Information Processing Systems . California， USA ： Curran Associates， Inc： 1230 – 1241 ［ DOI： 10.5555/3326943.3327056 http://dx.doi.org/10.5555/3326943.3327056 ］

Ge Y ， Xu J ， Zhao B N ， Joshi N ， Itti L and Vineet V . 2022 . Dall-e for detection： Language-driven compositional image synthesis for object detection . ［2022-6-20］ . https：//arxiv.org/pdf/2206.09592.pdf https://arxiv.org/pdf/2206.09592.pdf

Gecer B ， Bhattarai B ， Kittler J and Kim T K . 2018 . Semi-supervised adversarial learning to generate photorealistic face images of new identities from 3D morphable model // Proceedings of the European Conference on Computer Vision . Munich， Germany ： Springer： 230 – 248 ［ DOI： 10.1007/978-3-030-01252-6_14 http://dx.doi.org/10.1007/978-3-030-01252-6_14 ］

Geleta A ， Bras R L and Choi Y . 2020 . GPT-VAE： Transformer-Based Generative Pretrained VAE // Proceedings of the International Conference on Neural Information Processing Systems . Virtual .

Geng H R ， Xu H L ， Zhao C Y ， Xu C ， Yi L ， Huang S Y and Wang H . 2023 . GAPartNet： cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 7081 – 7091 ［ DOI： 10.1109/CVPRW53098.2023.00101 http://dx.doi.org/10.1109/CVPRW53098.2023.00101 ］

Ghiasi G ， Cui Y ， Srinivas A ， Qian R ， Lin T Y ， Cubuk E D ， Le Q V and Zoph B . 2021 . Simple copy-paste is a strong data augmentation method for instance segmentation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 2917 – 2927 ［ DOI： 10.1109/CVPR46437.2021.00294 http://dx.doi.org/10.1109/CVPR46437.2021.00294 ］

Girdhar R ， Singh M ， Brown A ， Duval Q ， Azadi S ， Rambhatla S S ， Shah A ， Yin X ， Parikh D and Misra I . 2024 . Emu video： factorizing text-to-video generation by explicit image conditioning ［EB/OL］. ［ 2024-08-02 ］. https://arxiv.org/abs/2311.10709.pdf https://arxiv.org/abs/2311.10709.pdf

Girish S ， Gupta K and Shrivastava A . 2024 . EAGLES： efficient accelerated 3D gaussians with lightweight EncodingS ［EB/OL］. ［ 2024-09-26 ］. https://arxiv.org/abs/2312.04564 https://arxiv.org/abs/2312.04564

Gomez N ， Ren M ， Urtasun R ， and Grosse R B . 2017 . The Reversible Residual Network： Backpropagation Without Storing Activations // Proceedings of the Advances in Neural Information Processing Systems . Long Beach， California， USA ： Curran Associates， Inc： 2214 – 2224 ［ DOI： 10.48550/arXiv.1707.04585 http://dx.doi.org/10.48550/arXiv.1707.04585 ］

Gong Y P ， Zeng Z Y ， Chen L W ， Luo Y F ， Weng B and Ye F . 2021 . A person re-identification data augmentation method with adversarial defense effect ［EB/OL］. ［ 2021-07-04 ］. https://arxiv.org/abs/2101.08783.pdf https://arxiv.org/abs/2101.08783.pdf

Gong Z ， Danelljan M ， Sun H ， Mangas J D and Van Gool L . 2023 . Prompting diffusion representations for cross-domain semantic segmentation ［EB/OL］. ［ 2023-07-05 ］. https://arxiv.org/pdf/2307.02138.pdf https://arxiv.org/pdf/2307.02138.pdf

Gonzales R C and Wintz P . 1987 . Digital image processing. USA： Addison-Wesley Longman Publishing Co.， Inc .

Goodfellow I ， Pouget-Abadie J ， Mirza M ， Xu B ， Warde-Farley D ， Ozair S ， Courville A and Bengio Y . 2014a . Generative adversarial networks // Proceedings of the Advances in Neural Information Processing Systems . Montreal， Canada ： Curran Associates， Inc： 2672 – 2680 ［ DOI： 10.1145/3422622 http://dx.doi.org/10.1145/3422622 ］

Goodfellow I ， Pouget-Abadie J ， Mirza M ， Xu B ， Warde-Farley D ， Ozair S ， Courville A and Bengio Y . 2014b . Conditional generative adversarial nets ［EB/OL］. ［ 2014-11-06 ］. https://arxiv.org/abs/1411.1784.pdf https://arxiv.org/abs/1411.1784.pdf

Goodfellow I ， Pouget-Abadie J ， Mirza M ， Xu B ， Warde-Farley D ， Ozair S ， Courville A an d Bengio Y . 2020 . Generative adversarial networks . Communications of the ACM ， 63 （ 11 ）： 139 – 144 ［ DOI： 10.1145/3422622 http://dx.doi.org/10.1145/3422622 ］

Gowda S N ， Rohrbach M and Keller F . 2022 . Learn2Augment： Learning to composite videos for data augmentation in action recognition // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer Nature Switzerland： 242 – 259 ［ DOI： 10.1007/978-3-031-19821-2_14 http://dx.doi.org/10.1007/978-3-031-19821-2_14 ］

Grathwohl W ， Chen R T Q ， Bettencourt J ， Sutskever I ， and Duvenaud D . 2019 . FFJORD： Free-form Continuous Dynamics for Scalable Reversible Generative Models // Proceedings of the International Conference on Learning Representations . New Orleans， Louisiana， USA ：： . ［ DOI： 10.48550/arXiv.1810.01367 http://dx.doi.org/10.48550/arXiv.1810.01367 ］

Grigorev A ， Iskakov K ， Ianina A ， Bashirov R ， Zakharkin I ， Vakhitov A and Lempitsky V . 2021 . StylePeople： A generative model of full-body human avatars // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 5147 – 5156 ［ DOI： 10.1109/CVPR46437.2021.00511 http://dx.doi.org/10.1109/CVPR46437.2021.00511 ］

Gu J T ， Trevithick A ， Lin K E ， Susskind J M ， Theobalt C ， Liu L J and Ramamoorthi R . 2023 . Nerfdiff： single-image view synthesis with nerf-guided distillation from 3d-aware diffusion // Proceedings of the 40th International Conference on Machine Learning . Honolulu， Hawaii， USA ： ACM： 11808 – 11826 ［ DOI： 10.48550/arXiv.2302.10109 http://dx.doi.org/10.48550/arXiv.2302.10109 ］

Gulrajani I ， Ahmed F ， Arjovsky M ， Dumoulin V and Courville A C . 2017 . Improved training of Wasserstein GANs // Proceedings of the Annual Conference on Neural Information Processing Systems . Long Beach， California， USA ： Curran Associates， Inc： 5769 – 5779 ［ DOI： 10.5555/3295222.3295327 http://dx.doi.org/10.5555/3295222.3295327 ］

Guo C L ， Li C Y ， Guo J C ， Loy C C ， Hou J H ， Kwong S and Cong R M . 2020 . Zero-reference deep curve estimation for low-light image enhancement // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 1777 – 1786 ［ DOI： 10.1109/CVPR42600.2020.00185 http://dx.doi.org/10.1109/CVPR42600.2020.00185 ］

Guo Y ， Yang C Y ， Rao A ， Liang Z Y ， Wang Y H ， Qiao Y ， Agrawala M ， Lin D H and Dai B . 2024 . Animatediff： animate your personalized text-to-image diffusion models without specific tuning ［EB/OL］. ［ 2024-02-08 ］. https://arxiv.org/abs/2307.04725.pdf https://arxiv.org/abs/2307.04725.pdf

Gupta A ， Yu L J ， Sohn K ， Gu X Y ， Hahn M ， Li F F ， Essa I ， Jiang L and Lezama J . 2024 . Photorealistic video generation with diffusion models // Proceedings of the European Conference on Computer Vision . Milano， Italy ： Springer： 393 – 411 ［ DOI： 10.1007/978-3-031-72986-7_23 http://dx.doi.org/10.1007/978-3-031-72986-7_23 ］

Haarnoja T ， Hartikainen K ， Abbeel P and Levine S . 2018 . Latent Space Policies for Hierarchical Reinforcement Learning // Proceedings of the 35th International Conference on Machine Learning . Stockholm， Sweden ： PMLR： 1846 – 1855 ［ DOI： 10.48550/arXiv.1804.02808 http://dx.doi.org/10.48550/arXiv.1804.02808 ］

He K ， Chen X ， Xie S ， Li Y ， Dollár P ， Girshick R . 2022a . Masked autoencoders are scalable vision learners // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE . 16000 - 16009 ［ DOI： 10.1109/cvpr52688.2022.01553 http://dx.doi.org/10.1109/cvpr52688.2022.01553 ］

He R C ， Sun S S ， Yu X K ， Xue C W ， Zhang W J ， Torr P ， Bai S and Qi X J . 2023a . Is synthetic data from generative models ready for image recognition？ // Proceedings of the Eleventh International Conference on Learning Representations . Kigali， Rwanda . ［ DOI： 10.48550/arXiv.2210.07574 http://dx.doi.org/10.48550/arXiv.2210.07574 ］

He Y Q ， Yang T Y ， Zhang Y ， Shan Y and Chen Q F . 2023b . Latent video diffusion models for high-fidelity long video generation ［EB/OL］. ［ 2023-03-20 ］. https://arxiv.org/abs/2211.13221.pdf https://arxiv.org/abs/2211.13221.pdf

He Z ， Lin M ， Xu Z H ， Yao Z Q ， Chen H ， Alhudhaif A and Alenezi F . 2022b . Deconv-transformer （DecT）： A histopathological image classification model for breast cancer based on color deconvolution and transformer architecture . Information Sciences ， 608 ： 1093 - 1112 ［ DOI： DOI：10.1016/j.ins.2022.06.091 http://dx.doi.org/DOI：10.1016/j.ins.2022.06.091 ］

Helminger L ， Bernasconi M ， Djelouah A ， Gross M and Schroers C . 2021 . Generic image restoration with flow based priors // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Nashville， TN， USA ： IEEE： 334 – 343 ［ DOI： 10.1109/CVPRW53098.2021.00043 http://dx.doi.org/10.1109/CVPRW53098.2021.00043 ］

Hewitt C ， Baltrušaitis T ， Wood E ， Petikam L ， Florentin L and Velasquez H C . 2023 . Procedural humans for computer vision ［EB/OL］. ［ 2023-01-03 ］. https://arxiv.org/abs/2301.01161 https://arxiv.org/abs/2301.01161

Hewitt C ， Saleh F ， Aliakbarian S ， Petikam L ， Rezaeifar S ， Florentin L ， Hosenie Z ， Cashman T J ， Valentin J ， Cosker D and Baltrusaitis T . 2024 . Look ma， no markers： holistic performance capture without the hassle . ACM Transactions on Graphics ， 43 （ 6 ）： 1 – 12 ［ DOI： 10.1145/3687772 http://dx.doi.org/10.1145/3687772 ］

Higgins I ， Pal A ， Ramesh A ， et al . 2017 . β-VAE： Learning basic visual concepts with a constrained variational framework // International Conference on Learning Representations . Toulon， France ： ICLR ［ DOI： 10.1109/ICLR.2017.7016273 http://dx.doi.org/10.1109/ICLR.2017.7016273 ］

Hillerström F ， Kumar A and Veldhuis R . 2014 . Generating and analyzing synthetic finger vein images // Proceedings of the 2014 International Conference of the Biometrics Special Interest Group . Darmstadt， Germany ： IEEE： 1 – 9

Ho J ， Chan W ， Saharia C ， Whang J ， Gao R ， Gritsenko A ， Kingma D P ， Poole B ， Norouzi M ， Fleet D J and Salimans T . 2022a . Imagen video： high definition video generation with diffusion models ［EB/OL］. ［ 2022-10-05 ］. https://arxiv.org/abs/2210.02303.pdf https://arxiv.org/abs/2210.02303.pdf

Ho J ， Chen X ， Srinivas A ， Duan Y ， and Abbeel P . 2019 . Flow++： Improving flow-based generative models with variational dequantization and architecture design // Proceedings of the 36th International Conference on Machine Learning . Long Beach， California， USA ： PMLR： 2722 -- 2730 ［ DOI： 10.48550/arXiv. 1902.00275 http://dx.doi.org/10.48550/arXiv.1902.00275 ］

Ho J ， Jain A and Abbeel P . 2020 . Denoising diffusion probabilistic models // Proceedings of the Advances in Neural Information Processing Systems . Virtual ： Curran Associates， Inc： 6840 – 6851 ［ DOI： 10.5555/3495724.3496298 http://dx.doi.org/10.5555/3495724.3496298 ］

Ho J ， Salimans T ， Gritsenko A ， Chan W ， Norouzi M and Fleet D J . 2022b . Video diffusion models // Proceedings of the Advances in Neural Information Processing Systems . Virtual ： Curran Associates Inc： 8633 – 8646 ［ DOI： 10.48550/arXiv.2204.03458 http://dx.doi.org/10.48550/arXiv.2204.03458 ］

Hoffman M ， Sountsov P ， Dillon J V ， Langmore I ， Tran D and Vasudevan S . 2018 . NeuTra-lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport // Proceedings of the Symposium on Advances in Approximate Bayesian Inference . California， USA . ［ DOI： 10.48550/arXiv.1903.03704 http://dx.doi.org/10.48550/arXiv.1903.03704 ］

Hong F ， Chen Z ， Lan Y X ， Pan L and Liu Z . 2023 . EVA 3 D ： Compositional 3D human generation from 2D image collections // Proceedings of the International Conference on Learning Representations. Kigali， Rwanda ： Curran Associates， Inc：［ DOI： 10.48550/arXiv.2210.04888 http://dx.doi.org/10.48550/arXiv.2210.04888 ］

Hong F ， Zhang M Y ， Pan L ， Cai Z ， Yang L and Liu Z . 2022a . AvatarCLIP： Zero-shot text-driven generation and animation of 3D avatars . ACM Transactions on Graphics ， 41 （ 4 ）： 1 – 19 ［ DOI： 10.1145/3528223.3530094 http://dx.doi.org/10.1145/3528223.3530094 ］

Hong S ， Seo J ， Shin H ， Hong S and Kim S . 2024a . DirecT2V： large language models are frame-level directors for zero-shot text-to-video generation ［EB/OL］. ［ 2024-02-06 ］. https://arxiv.org/abs/2305.14330.pdf https://arxiv.org/abs/2305.14330.pdf

Hong W ， Ding M ， Zheng W ， Liu X and Tang J . 2022b . CogVideo： Large-scale pretraining for text-to-video generation via transformers ［EB/OL］. ［ 2022-05-29 ］. https://arxiv.org/pdf/2205.15868.pdf https://arxiv.org/pdf/2205.15868.pdf

Hou Y ， Li C Y ， Lu Y H ， Zhu L P ， Li Y ， Jia H Z and Xie X D . 2022 . Enhancing and dissecting crowd counting by synthetic data // Proceedings of the IEEE International Conference on Acoustics， Speech， and Signal Processing . Singapore ： IEEE： 2539 – 2543 ［ DOI： 10.1109/ICASSP43922.2022.9747070 http://dx.doi.org/10.1109/ICASSP43922.2022.9747070 ］

Hou Y ， Zhang S H ， Ma R ， Jia H Z and Xie X D . 2023 . Frame-recurrent video crowd counting . IEEE Transactions on Circuits and Systems for Video Technology ， 33 （ 9 ）： 5186 – 5199 ［ DOI： 10.1109/TCSVT.2023.3245678 http://dx.doi.org/10.1109/TCSVT.2023.3245678 ］

Hu M ， Zhao P ， Xu C ， Sun Q ， Lou J ， Lin Q ， Luo P ， Rajmohan S and Zhang D . 2024 . AgentGen： enhancing planning abilities for large language model-based agents via environment and task generation ［EB/OL］. ［ 2024-11-28 ］. https://arxiv.org/abs/2408.00764.pdf https://arxiv.org/abs/2408.00764.pdf

Hu Y T ， Chen H S ， Hui K ， Huang J B and Schwing A G . 2019 . SAIL-VOS： semantic amodal instance level video object segmentation – a synthetic dataset and baselines // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 3100 – 3110 ［ DOI： 10.1109/CVPR.2019.00322 http://dx.doi.org/10.1109/CVPR.2019.00322 ］

Huang B ， Yu Z ， Chen A ， Geiger A and Gao S H . 2024a . 2D gaussian splatting for geometrically accurate radiance fields // Proceedings of the ACM SIGGRAPH 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery： 1 – 11 ［ DOI： 10.1145/3641519.3657428 http://dx.doi.org/10.1145/3641519.3657428 ］

Huang C W ， Krueger D ， Lacoste A ， and Courville A . 2018a . Neural Autoregressive Flows. Proceedings of the 35th International Conference on Machine Learning . Vienna， Austria ： PMLR ： 2078 - 2087 ［ DOI： http：//proceedings.mlr.press/v80/huang18d/huang18d.pdf http://dx.doi.org/http：//proceedings.mlr.press/v80/huang18d/huang18d.pdf ］

Huang H J ， Li D W ， Zhang Z ， Chen X T and Huang K Q . 2018b . Adversarially occluded samples for person re-identification // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 5098 – 5107 ［ DOI： 10.1109/CVPR.2018.00535 http://dx.doi.org/10.1109/CVPR.2018.00535 ］

Huang J ， Ma L ， Tan T and Wang Y H . 2003 . Learning based resolution enhancement of iris images // Proceedings of British Machine Vision Conference . ［ DOI： 10.5244/C.17.16 http://dx.doi.org/10.5244/C.17.16 ］

Huang Y ， Wu Q ， Xu J S and Zhong Y . 2019 . SBSGAN： suppression of inter-domain background shift for person re-identification // Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 9526 – 9535 ［ DOI： 10.1109/ICCV.2019.00962 http://dx.doi.org/10.1109/ICCV.2019.00962 ］

Huang Z Q ， Chan K C K ， Jiang Y M and Liu Z W . 2023 . Collaborative diffusion for multi-modal face generation and editing // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 6080 – 6090 ［ DOI： 10.1109/CVPR52729.2023.00589 http://dx.doi.org/10.1109/CVPR52729.2023.00589 ］

Huang Z Q ， He Y N ， Yu J S ， Zhang F ， Si C Y ， Jiang Y M ， Zhang Y H ， Wu T X ， Jin Q Y ， Chanpaisit N ， Wang Y H ， Chen X Y ， Wang L M ， Lin D H ， Qiao Y and Liu Z W . 2024b . VBench： comprehensive benchmark suite for video generative models // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 21807 – 21818 ［ DOI： 10.1109/CVPR52733.2024.02060 http://dx.doi.org/10.1109/CVPR52733.2024.02060 ］

Huang Z ， Chen Q ， Sun L ， Yang Y ， Wang N ， Wu Q and Tan M . 2024c . G-NeRF： Geometry-enhanced novel view synthesis from single-view images // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 10117 - 10126 ［ DOI： 10.1109/CVPR52733.2024.00964 http://dx.doi.org/10.1109/CVPR52733.2024.00964 ］

Ionescu C ， Papava D ， Olaru V and Sminchisescu C . 2014 . Human3.6M： Large scale datasets and predictive methods for 3D human sensing in natural environments . Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence ， 36 （ 7 ）： 1325 – 1339 ［ DOI： 10.1109/TPAMI.2013.212 http://dx.doi.org/10.1109/TPAMI.2013.212 ］

Isola P， Zhu J Y， Zhou T H and Efros A A. Tinghui . 2017 . Image-to-image translation with conditional adversarial networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu， HI， USA ： IEEE： 1125 – 1134 ［ DOI： 10.1109/CVPR.2017.124 http://dx.doi.org/10.1109/CVPR.2017.124 ］

Jahn M ， Rombach R and Ommer B . 2021 . High-resolution complex scene synthesis with transformers ［EB/OL］. ［ 2021-5-13 ］. https://arxiv.org/abs/2105.06458.pdf https://arxiv.org/abs/2105.06458.pdf

Jain A ， Mildenhall B ， Barron J T ， Abbeel P and Poole B . 2022 . Zero-shot text-guided object generation with dream fields // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 867 – 876 ［ DOI： 10.1109/CVPR52688.2022.00094 http://dx.doi.org/10.1109/CVPR52688.2022.00094 ］

Jiang B ， Chen X ， Liu W ， Yu J Y ， Yu G and Chen T . 2023a . MotionGPT： Human Motion as a Foreign Language // Proceedings of the Advances in Neural Information Processing Systems . New Orleans， Louisiana， USA ： Curran Associates， Inc： 20067 – 20079 ［ DOI： 10.48550/arXiv.2306.14795 http://dx.doi.org/10.48550/arXiv.2306.14795 ］

Jiang Y F ， Gong X Y ， Liu D ， Cheng Y ， Fang C ， Shen X H ， Yang J C ， Zhou P and Wang Z Y . 2021 . EnlightenGAN： Deep light enhancement without paired supervision . IEEE Transactions on Image Processing ， 30 ： 2340 – 2349 ［ DOI： 10.1109/TIP.2021.3051462 http://dx.doi.org/10.1109/TIP.2021.3051462 ］

Jiang Y F ， Wang C ， Zhang R H ， Wu J J and Li F F . 2024 . TRANSIC： sim-to-real policy transfer by learning from online correction ［EB/OL］. ［ 2024-10-14 ］. https://arxiv.org/abs/2405.10315.pdf https://arxiv.org/abs/2405.10315.pdf

Jiang Y W Q ， Tu J D ， Liu Y ， Gao X F ， Long X X ， Wang W P and Ma Y X . 2023b . GaussianShader： 3D gaussian splatting with shading functions for reflective surfaces ［EB/OL］. ［ 2023-11-29 ］. https://arxiv.org/abs/2311.17977.pdf https://arxiv.org/abs/2311.17977.pdf

Jiawei X ， Zexin F ， Jian Y and Jin X . 2024 . Grid 4 D ： 4D decomposed hash encoding for high-fidelity dynamic scene rendering // Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems. Vancouver， Canada ： Curran Associates， Inc.：［ DOI： 10.48550/arXiv.2410.20815 http://dx.doi.org/10.48550/arXiv.2410.20815 ］

Jin J ， Shen L ， Zhang R ， Zhao C ， Jin G ， Zhang J ， Ding S ， Zhao Y and Jia W . 2024a . PCE-palm： palm crease energy based two-stage realistic pseudo-palmprint generation // Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver， British Columbia， Canada ： AAAI： 2616 – 2624 ［ DOI： 10.1609/aaai.v38i3.28039 http://dx.doi.org/10.1609/aaai.v38i3.28039 ］

Jin J ， Zhao C L ， Zhang R X. ， Jia ， W ， 2025 . Diff-Palm： Realistic Palmprint Generation with Polynomial Creases and Intra-Class Variation Controllable Diffusion Models . IEEE Conference on Computer Vision and Pattern Recognition（CVPR 2025）.（Accept）

Jin X ， Chen Z ， Lin J ， Chen Z and Zhou W . 2019 . Unsupervised single image deraining with self-supervised constraints // Proceedings of the 2019 IEEE International Conference on Image Processing ， Taipei， Taiwan ： IEEE： 2761 - 2765 ［ DOI： 10.1109/ICIP.2019.8803238 http://dx.doi.org/10.1109/ICIP.2019.8803238 ］

Jin Y ， Sun Z C ， Li N Y ， Xu K ， Jiang H ， Zhuang N ， Huang Q Z ， Song Y ， Mu Y D and Lin Z C . 2024b . Pyramidal flow matching for efficient video generative modeling ［EB/OL］. ［ 2024-10-08 ］. https://arxiv.org/abs/2410.05954.pdf https://arxiv.org/abs/2410.05954.pdf

Jingtian Z ， Shum H ， Han Jand Shao L . 2018 . Action recognition from arbitrary views using transferable dictionary learning . IEEE Transactions on Image Processing ， 27 （ 10 ）： 4709 – 4723 ［ DOI： 10.1109/TIP.2018.2836323 http://dx.doi.org/10.1109/TIP.2018.2836323 ］

Johnson J ， Alahi A and L F F . 2016 . Perceptual losses for real-time style transfer and super-resolution // Proceedings of the European Conference on Computer Vision . Amsterdam， Netherlands ： Springer： 694 – 711 ［ DOI： 10.1007/978-3-319-46475-6_43 http://dx.doi.org/10.1007/978-3-319-46475-6_43 ］

Johnson P ， Hua F and Schuckers S . 2013 . Texture modeling for synthetic fingerprint generation // Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops . Portland， OR， USA ： IEEE： 154 – 159 ［ DOI： 10.1109/CVPRW.2013.30 http://dx.doi.org/10.1109/CVPRW.2013.30 ］

Ju X Z ， Zeng A ， Zhao C ， Wang J ， Zhang L and Xu Q . 2023 . HumanSD： A native skeleton-guided diffusion model for human image generation // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 15942 – 15952 ［ DOI： 10.1109/ICCV51070.2023.01465 http://dx.doi.org/10.1109/ICCV51070.2023.01465 ］

Kang L W ， Lin C W and Fu Y H . 2011 . Automatic single-image-based rain streaks removal via image decomposition . IEEE Transactions on Image Processing ， 21 （ 4 ）： 1742 – 1755 ［ DOI： 10.1109/TIP.2011.2179057 http://dx.doi.org/10.1109/TIP.2011.2179057 ］

Kang M ， Zhu J Y ， Zhang R ， Park J ， Shechtman E ， Paris S and Park T . 2023 . Scaling up GANs for Text-to-Image Synthesis // Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： 10124 - 10134 ［ DOI： 10.1109/cvpr52729.2023.00976 http://dx.doi.org/10.1109/cvpr52729.2023.00976 ］

Karnewar A and Wang O . 2020 . MSG-GAN： multi-scale gradients for generative adversarial networks // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 7799 – 7808 ［ DOI： 10.1109/CVPR42600.2020.00782 http://dx.doi.org/10.1109/CVPR42600.2020.00782 ］

Karnewar A ， Mitra N J ， Vedaldi A and Novotny D . 2023a . Holo-fusion： towards photo-realistic 3d generative modeling // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 22976 – 22985 ［ DOI： 10.1109/ICCV51070.2023.02100 http://dx.doi.org/10.1109/ICCV51070.2023.02100 ］

Karnewar A ， Vedaldi A ， Novotny D and Mitra N J . 2023b . Holo-diffusion： training a 3d diffusion model using 2d images // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 18423 – 18433 ［ DOI： 10.1109/CVPR52729.2023.01767 http://dx.doi.org/10.1109/CVPR52729.2023.01767 ］

Karras T ， Aila T ， Laine S and Lehtinen J . 2017 . Progressive growing of GANs for improved quality， stability， and variation ［EB/OL］. ［ 2017-10-27 ］. http://arxiv.org/abs/1710.10196.pdf http://arxiv.org/abs/1710.10196.pdf

Karras T ， Aittala M ， Laine S ， Härkönen E ， Hellsten J ， Lehtinen J ， and Aila T . 2021a . Alias-Free generative adversarial networks // Proceedings of 35th Annual Conference on Neural Information Processing Systems . Virtual ： Curran Associates， Inc： 852 - 863 ［ DOI： 10.5555/3540261.3540327 http://dx.doi.org/10.5555/3540261.3540327 ］

Karras T ， Laine S and Aila T . 2021b . A style-based generator architecture for generative adversarial networks . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 43 （ 12 ）： 4217 – 4228 ［ DOI： 10.1109/TPAMI.2020.2970919 http://dx.doi.org/10.1109/TPAMI.2020.2970919 ］

Karras T ， Laine S ， Aittala M ， Hellsten J ， Lehtinen J and Aila T . 2020 . Analyzing and improving the image quality of StyleGAN // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 8107 – 8116 ［ DOI： 10.1109/CVPR42600.2020.00813 http://dx.doi.org/10.1109/CVPR42600.2020.00813 ］

Kaspar M ， Muñoz Osorio J D and Bock J . 2020 . Sim2Real transfer for reinforcement learning without dynamics randomization // Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems . Las Vegas， NV， USA ： IEEE： 4383 – 4388 ［ DOI： 10.1109/IROS45743.2020.9341201 http://dx.doi.org/10.1109/IROS45743.2020.9341201 ］

Katara P ， Xian Z and Fragkiadaki K . 2024 . Gen2Sim： scaling up robot learning in simulation with generative models // Proceedings of the IEEE International Conference on Robotics and Automation . Yokohama， Japan ： IEEE： 6672 – 6679 ［ DOI： 10.1109/ICRA57147.2024.10610566 http://dx.doi.org/10.1109/ICRA57147.2024.10610566 ］

Kerbl B ， Kopanas G ， Leimkühler T and Drettakis G . 2023 . 3D gaussian splatting for real-time radiance field rendering . ACM Transactions on Graphics ， 42 （ 4 ）： 1 – 12 ［ DOI： 10.48550/arXiv.2308.04079 http://dx.doi.org/10.48550/arXiv.2308.04079 ］

Kerbl B ， Meuleman A ， Kopanas G ， Wimmer M ， Lanvin A and Drettakis G . 2024a . A hierarchical 3D gaussian representation for real-time rendering of very large datasets . ACM Transactions on Graphics ， 43 （ 4 ）： 1 – 13 ［ DOI： 10.1145/3658160 http://dx.doi.org/10.1145/3658160 ］

Kerbl B ， Vicente Carrasco F ， Steinberger M and De La Torre F . 2024b . Taming 3DGS： high-quality radiance fields with limited resources // Proceedings of the SIGGRAPH Asia 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery：［ DOI： 10.1145/3680528.3687694 http://dx.doi.org/10.1145/3680528.3687694 ］

Kerim A ， Aslan C ， Celikcan U ， Erdem E and Erdem A . 2021 . NOVA： rendering virtual worlds with humans for computer vision tasks . Computer Graphics Forum ， 40 （ 6 ）： 258 – 272 ［ DOI： 10.1111/cgf.14271 http://dx.doi.org/10.1111/cgf.14271 ］

Khachatryan L ， Movsisyan A ， Tadevosyan V ， Henschel R ， Wang Z ， Navasardyan S and Shi H . 2023 . Text2video-zero： text-to-image diffusion models are zero-shot video generators // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 15908 – 15918 ［ DOI： 10.1109/ICCV51070.2023.01462 http://dx.doi.org/10.1109/ICCV51070.2023.01462 ］

Kim H ， Cui X ， Kim M G and Nguyen T H B . 2019 . Fingerprint generation and presentation attack detection using deep neural networks // Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval . San Jose， CA， USA ： IEEE： 375 – 378 ［ DOI： 10.1109/MIPR.2019.00074 http://dx.doi.org/10.1109/MIPR.2019.00074 ］

Kim I H ， Lee J ， Jin W ， Son S ， Cho K ， Seo J ， Kwak M S ， Cho S ， Baek J ， Lee B and Kim S . 2024 . Pose-dIVE： pose-diversified augmentation with diffusion model for person re-identification ［EB/OL］. ［ 2024-10-15 ］. http://arxiv.org/abs/2406.16042.pdf http://arxiv.org/abs/2406.16042.pdf

Kim J ， Lee J K and Lee K M . 2016 . Accurate image super-resolution using very deep convolutional networks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 1646 – 1654 ［ DOI： 10.1109/CVPR.2016.182 http://dx.doi.org/10.1109/CVPR.2016.182 ］

Kim M ， Liu F ， Jain A and Liu X M . 2023 . DCFace： synthetic face generation with dual condition diffusion model // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 12715 – 12725 ［ DOI： 10.1109/CVPR52729.2023.01223 http://dx.doi.org/10.1109/CVPR52729.2023.01223 ］

Kim Y J ， Kim J Y ， Oh T H . 2022 . CLIPActor： Text-driven recommendation and stylization for animating human meshes // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer Nature Switzerland： 173 – 191 ［ DOI： 10.1007/978-3-031-20062-5_11 http://dx.doi.org/10.1007/978-3-031-20062-5_11 ］

Kingma D P and Dhariwal P . 2018 . Glow： Generative Flow with Invertible 1 x 1 Convolutions//Proceedings of the Advances in Neural Information Processing Systems. Montreal， Canada： Curran Associates， Inc：［ DOI： 10.48550/arXiv.1807.03039 http://dx.doi.org/10.48550/arXiv.1807.03039 ］

Kingma D P and Welling M . 2013 . Auto-encoding variational bayes ［EB/OL］. ［ 2013-12-10 ］. https://arxiv.org/abs/1312.6114.pdf https://arxiv.org/abs/1312.6114.pdf

Kingma D P and Welling M . 2019 . An Introduction to Variational Autoencoders . Foundations and Trends® in Machine Learning ， 12 （ 4 ）： 307 – 392 ［ DOI： 10.1561/2200000056 http://dx.doi.org/10.1561/2200000056 ］

Klein L and Noé F . 2024 . Transferable Boltzmann Generators // Proceedings of the Advances in Neural Information Processing Systems . Vancouver， Canada ： Curran Associates， Inc ［ DOI： 10.48550/arXiv. 2406.14426 http://dx.doi.org/10.48550/arXiv.2406.14426 ］

Kohli N ， Yadav D ， Vatsa M ， Singh R and Noore A . 2017 . Synthetic iris presentation attack using iDCGAN // Proceedings of the 2017 IEEE International Joint Conference on Biometrics . Denver， CO， USA ： IEEE： 674 – 680 ［ DOI： 10.1109/BTAS.2017.8272756 http://dx.doi.org/10.1109/BTAS.2017.8272756 ］

Kolotouros N ， Alldieck T ， Zanfir A ， Bazavan E G ， Fieraru M and Sminchisescu C . 2023 . DreamHuman： Animatable 3D avatars from text // Proceedings of the Advances in Neural Information Processing Systems . New Orleans， Louisiana， USA ： Curran Associates， Inc： 10516 – 10529 ［ DOI： 10.48550/arXiv.2306.09329 http://dx.doi.org/10.48550/arXiv.2306.09329 ］

Kolve E ， Mottaghi R ， Han W ， VanderBilt E ， Weihs L ， Herrasti A ， Deitke M ， Ehsani K ， Gordon D ， Zhu Y ， Kembhavi A ， Gupta A and Farhadi A . 2022 . AI2-THOR： an interactive 3D environment for visual AI ［EB/OL］. ［ 2022-08-26 ］. https://arxiv.org/abs/1712.05474.pdf https://arxiv.org/abs/1712.05474.pdf

Kondapaneni N ， Marks M ， Knott M ， Guimaraes R and Perona P . 2024 . Text-image alignment for diffusion based perception // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 13883 – 13893 ［ DOI： 10.1109/CVPR52733.2024.01317 http://dx.doi.org/10.1109/CVPR52733.2024.01317 ］

Kong W ， Tian Q ， Zhang Z ， Min R ， Dai Z ， Zhou J ， Xiong J ， Li X ， Wu B ， Zhang J ， Wu K ， Lin Q ， Yuan J ， Long Y ， Wang A ， Wang A ， Li C ， Huang D ， Yang F ， Tan H ， Wang H ， Song J ， Bai J ， Wu J ， Xue J ， Wang J ， Wang K ， Liu M ， Li P ， Li S ， Wang W ， Yu W ， Deng X ， Li Y ， Chen Y ， Cui Y ， Peng Y ， Yu Z ， He Z ， Xu Z ， Zhou Z ， Xu Z ， Tao Y ， Lu Q ， Liu S ， Zhou D ， Wang H ， Yang Y ， Wang D ， Liu Y ， Jiang J and Zhong C . 2024 . HunyuanVideo： a systematic framework for large video generative models ［EB/OL］. ［ 2024-12-06 ］. https://arxiv.org/abs/2412.03603.pdf https://arxiv.org/abs/2412.03603.pdf

Kosiorek A R ， Strathmann H ， Zoran D ， Moreno P ， Schneider R ， Mokrá S and Rezende D J . 2021 . Nerf-vae： a geometry aware 3d scene generative model // Proceedings of the 38th International Conference on Machine Learning . Virtually ： ACM： 5742 – 5752 ［ DOI： 10.48550/arXiv.2104.00587 http://dx.doi.org/10.48550/arXiv.2104.00587 ］

Kuaishou . 2024 . Kling ai

Kulal S ， Brooks T ， Aiken A ， Wu J J ， Yang J M ， Lu J W ， Efros A A and Singh K K . 2023 . Putting people in their place： affordance-aware human insertion into scenes // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 17089 – 17099 ［ DOI： 10.1109/CVPR52729.2023.01639 http://dx.doi.org/10.1109/CVPR52729.2023.01639 ］

Kulikov V ， Yadin S ， Kleiner M and Michaeli T . 2023 . SinDDM： A single image denoising diffusion model // Proceedings of the International Conference on Machine Learning . London， England ： PMLR： 17920 – 17930 ［ DOI： 10.5555/3618408.3619146 http://dx.doi.org/10.5555/3618408.3619146 ］

Labs P . 2024 . Pika 1 . 5

Lazaridis L ， Dimou A and Daras P . 2018 . Abnormal behavior detection in crowded scenes using density heatmaps and optical flow // Proceedings of the 26th European Signal Processing Conference . Rome， Italy ： IEEE： 2060 – 2064 ［ DOI： 10.23919/EUSIPCO.2018.8553620 http://dx.doi.org/10.23919/EUSIPCO.2018.8553620 ］

Ledig C ， Theis L ， Huszár F ， Caballero J ， Cunningham A ， Acosta A ， Aitken A ， Tejani A ， Totz J and Wang Z . 2017 . Photo-realistic single image super-resolution using a generative adversarial network // Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 105 – 114 ［ DOI： 10.1109/CVPR.2017.19 http://dx.doi.org/10.1109/CVPR.2017.19 ］

Lee D ， Kim C ， Kim S ， Cho M and Han W S . 2022 . Autoregressive image generation using residual quantization // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 11513 – 11522 ［ DOI： 10.1109/CVPR52688.2022.01123 http://dx.doi.org/10.1109/CVPR52688.2022.01123 ］

Li B ， Zhou H ， He J ， Wang M ， Yang Y and Li L . 2020 . On the Sentence Embeddings from Pre-trained Language Models // Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing . Online ： Association for Computational Linguistics： 9119 – 9130 ［ DOI： 10.18653/v1/2020.emnlp-main.733 http://dx.doi.org/10.18653/v1/2020.emnlp-main.733 ］

Li H and Wu X J . 2018 . DenseFuse： A fusion approach to infrared and visible images . IEEE Transactions on Image Processing ， 28 （ 5 ）： 2614 – 2623 ［ DOI： 10.1109/TIP.2018.2887342 http://dx.doi.org/10.1109/TIP.2018.2887342 ］

Li H ， Yang Y ， Chang M ， Chen S ， Feng H ， Xu Z ， Li Q and Chen Y . 2022a . SRDiff： Single image super-resolution with diffusion probabilistic models . Neurocomputing ， 479 ： 47 – 59 ［ DOI： 10.1016/j.neucom.2022.01.029 http://dx.doi.org/10.1016/j.neucom.2022.01.029 ］

Li H ， Ye M and Du B . 2021a . WePerson： Learning a Generalized Re-identification Model from All-weather Virtual Data // Proceedings of the 29th ACM International Conference on Multimedia . New York， NY， USA ： Association for Computing Machinery： 3115 – 3123 ［ DOI： 10.1145/3474085.3475455 http://dx.doi.org/10.1145/3474085.3475455 ］

Li J H ， Tan H ， Zhang K ， Xu Z X ， Luan F J ， Xu Y H ， Hong Y C ， Sunkavalli K ， Shakhnarovich G and Bi S . 2024a . Instant3d： fast text-to-3d with sparse-view generation and large reconstruction model // Proceedings of the Twelfth International Conference on Learning Representations . Vienna， Austria ： OpenView：［ DOI： 10.48550/arXiv.2311.06214 http://dx.doi.org/10.48550/arXiv.2311.06214 ］

Li J H ， Zhang J W ， Bai X ， Zheng J ， Ning X ， Zhou J and Gu L . 2024b . DNGaussian： optimizing sparse-view 3D gaussian radiance fields with global-local depth normalization // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 20775 – 20785 ［ DOI： 10.1109/CVPR52733.2024.01963 http://dx.doi.org/10.1109/CVPR52733.2024.01963 ］

Li K ， Wang J ， Yang L ， Lu C and Dai B . 2024c . SemGrasp： semantic grasp generation via language aligned discretization ［EB/OL］. ［ 2024-04-04 ］. https://arxiv.org/abs/2404.03590.pdf https://arxiv.org/abs/2404.03590.pdf

Li L ， Tang J and Shao Z . 2022b . Sketch-to-photo face generation based on semantic consistency preserving and similar connected component refinement . The Visual Computer ， 38 （ 11 ）： 3577 – 3594 ［ DOI： 10.1007/s00371-021-02188-1 http://dx.doi.org/10.1007/s00371-021-02188-1 ］

Li P ， Liu Z and Chen k . 2023a . TrackDiffusion： Multi-object tracking data generation via diffusion models . ［2023-12-01］ . https：//arxiv.org/pdf/2312.00651.pdf https://arxiv.org/pdf/2312.00651.pdf

Li R ， Cheong L F and Tan R T . 2019a . Heavy rain image restoration： Integrating physics model and conditional adversarial learning // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 1633 – 1642 ［ DOI： 10.1109/CVPR.2019.00173 http://dx.doi.org/10.1109/CVPR.2019.00173 ］

Li T H ， Tian Y L ， Li H ， Deng M Y and He K M . 2024d . Autoregressive image generation without vector quantization . ［2024-11-01］ . https：//arxiv.org/abs/2406.11838 https://arxiv.org/abs/2406.11838

Li X ， Chu W Q ， Wu Y ， Yuan W H ， Liu F L ， Zhang Q ， Li F ， Feng H C ， Ding E and Wang J D . 2023b . Videogen： a reference-guided latent diffusion approach for high definition text-to-video generation ［EB/OL］. ［ 2023-09-07 ］. https://arxiv.org/abs/2309.00398.pdf https://arxiv.org/abs/2309.00398.pdf

Li Y H ， Chen X J ， Wu F and Zha Z J . 2019b . LinesToFacePhoto： face photo generation from lines with conditional self-attention generative adversarial networks // Proceedings of the 27th ACM International Conference on Multimedia . New York， NY， USA ： Association for Computing Machinery： 2323 – 2331 ［ DOI： 10.1145/3343031.3350854 http://dx.doi.org/10.1145/3343031.3350854 ］

Li Y H ， Liu H ， Wu Q ， Mu F ， Yang J ， Gao J S ， Li C S and Lee Y J . 2023c . GLIGEN： Open-set grounded text-to-image generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 22511 – 22521 ［ DOI： 10.1109/CVPR52598.2023.022511 http://dx.doi.org/10.1109/CVPR52598.2023.022511 ］

Li Y X ， Jiang L H ， Xu L N ， Xiangli Y B ， Wang Z Z ， Lin D H and Dai B . 2023d . MatrixCity： a large-scale city dataset for city-scale neural rendering and beyond // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 3182 – 3192 ［ DOI： 10.1109/ICCV51070.2023.00297 http://dx.doi.org/10.1109/ICCV51070.2023.00297 ］

Li Y ， Tan R T ， Guo X ， Lu J and Brown M S . 2016 . Rain streak removal using layer priors // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 2736 – 2744 ［ DOI： 10.1109/CVPR.2016.299 http://dx.doi.org/10.1109/CVPR.2016.299 ］

Li Z Q ， Yu T W ， Sang S ， Wang S ， Song M ， Liu Y H ， Yeh Y Y ， Zhu R ， Gundavarapu N ， Shi J ， Bi S ， Xu ZX ， Yu H X ， Sunkavalli K ， Hašan M ， Ramamoorthi R and Chandraker M . 2021b . OpenRooms： an end-to-end open framework for photorealistic indoor scene datasets ［EB/OL］. ［ 2021-09-27 ］. https://arxiv.org/abs/2007.12868.pdf https://arxiv.org/abs/2007.12868.pdf

Li Z ， Li Y ， Zhao P ， Song R ， Li X Y and Yang J . 2023e . Is synthetic data from diffusion models ready for knowledge distillation？［2023-5-22］ . https：//arxiv.org/abs/2305.12954.pdf https://arxiv.org/abs/2305.12954.pdf

Li Z ， Zhou Q ， Zhang X ， Zhang Y ， Wang Y and Xie W J . 2023f . Open-vocabulary object segmentation with diffusion models // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 7633 – 7642 ［ DOI： 10.1109/ICCV51070.2023.00705 http://dx.doi.org/10.1109/ICCV51070.2023.00705 ］

Lian L S ， Shi B L ， Yala A ， Darrell T and Li B . 2023 . LLM-grounded video diffusion models . ［2023-9-29］ . https：//arxiv.org/abs/2309.17444.pdf https://arxiv.org/abs/2309.17444.pdf

Liang J Y ， Zhang K ， Gu S H ， Gool L V and Timofte R . 2021 . Flow-based kernel prior with application to blind super-resolution // Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 10596 – 10605 ［ DOI： 10.1109/CVPR46437.2021.01046 http://dx.doi.org/10.1109/CVPR46437.2021.01046 ］

Liang P W ， Jiang J J ， Liu X M and Ma J Y . 2022 . Fusion from Decomposition： A Self-Supervised Decomposition Approach for Image Fusion .// Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer Nature Switzerland： 719 - 735 ［ DOI： 10.1007/978-3-031-19797-0_41 http://dx.doi.org/10.1007/978-3-031-19797-0_41 ］

Liang W Q ， Wang G C ， Lai J H and Zhu J Y . 2018 . M 2 M-GAN： many-to-many generative adversarial transfer learning for person re-identification ［EB/OL］. ［ 2018-11-09 ］. http://arxiv.org/abs/1811.03768.pdf http://arxiv.org/abs/1811.03768.pdf

Liang Y X ， Yang X ， Lin J T ， Li H D ， Xu X G and Chen Y C . 2023 . LucidDreamer： towards high-fidelity text-to-3D generation via interval score matching // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， Washington， USA ： IEEE： 6517 – 6526 ［ DOI： 10.1109/CVPR52733.2024.00623 http://dx.doi.org/10.1109/CVPR52733.2024.00623 ］

Liang Y X ， Yang X ， Lin J T ， Li H D ， Xu X G and Chen Y C . 2024 . Luciddreamer： towards high-fidelity text-to-3d generation via interval score matching // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 6517 – 6526 ［ DOI： 10.1109/CVPR52733.2024.00623 http://dx.doi.org/10.1109/CVPR52733.2024.00623 ］

Liao T T ， Yi H W ， Xiu Y L ， Tang J X ， Huang Y Y ， Thies J and Black M J . 2024 . TADA！： Text to animatable digital avatars // Proceedings of the International Conference on 3D Vision . Davos， Switzerland ： IEEE ［ DOI： 10.1109/3DV62453.2024.00150 http://dx.doi.org/10.1109/3DV62453.2024.00150 ］

Lin B ， Ge Y Y ， Cheng X H ， Li Z J ， Zhu B ， Wang S D ， He X Y ， Ye Y ， Yuan S H ， Chen L H ， Jia T H ， Zhang J W ， Tang Z Y ， Pang Y T ， She B ， Yan C ， Hu Z H ， Dong X Y ， Chen L ， Pan Z ， Zhou X ， Dong S L ， Tian Y H and Yuan L . 2024a . Open-sora plan： open-source large video generation model ［EB/OL］. ［ 2024-11-28 ］. https://arxiv.org/abs/2412.00131.pdf https://arxiv.org/abs/2412.00131.pdf .

Lin C H ， Gao J ， Tang L M ， Takikawa T ， Zeng X H ， Huang X ， Kreis K ， Fidler S ， Liu M Y and Lin T Y . 2023a . Magic3d： high-resolution text-to-3d content creation // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 300 – 309 ［ DOI： 10.1109/CVPR52729.2023.00037 http://dx.doi.org/10.1109/CVPR52729.2023.00037 ］

Lin H ， Zala A ， Cho J and Bansal M . 2023b . VideoDirectorGPT： Consistent multi-scene video generation via LLM-guided planning . ［2023-09-27］ . https：//arxiv.org/pdf/2309.15091.pdf https://arxiv.org/pdf/2309.15091.pdf

Lin K E ， Lin Y C ， Lai W S ， Lin T Y ， Shih Y C and Ramamoorthi R . 2023c . Vision transformer for NeRF-based view synthesis from a single input image // Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， USA ： IEEE： 806 - 815 ［ DOI： 10.1109/WACV56688.2023.00087 http://dx.doi.org/10.1109/WACV56688.2023.00087 ］

Lin W ， Gao J ， Wang Q and Li X . 2021a . Learning to detect anomaly events in crowd scenes from synthetic data . Neurocomputing ， 436 ： 248 – 259 ［ DOI： 10.1016/j.neucom.2020.12.091 http://dx.doi.org/10.1016/j.neucom.2020.12.091 ］

Lin X M ， Li Y K ， Hsiao J H ， Ho C and Kong Y . 2023d . Catch Missing Details： Image Reconstruction With Frequency Augmented Variational Autoencoder // Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 1736 – 1745 ［ DOI： 10.1109/CVPR52729.2023.00173 http://dx.doi.org/10.1109/CVPR52729.2023.00173 ］

Lin X ， He J ， Chen Z ， Lyu Z ， Fei B ， Dai B ， Ouyang W ， Qiao Y and Dong C . 2024b . DiffBIR： Towards blind image restoration with generative diffusion prior . ［2024-4-12］ . https：//arxiv.org/abs/2308.15070.pdf https://arxiv.org/abs/2308.15070.pdf

Lin Z C ， Liu C Y ， Qi W B and Chan S C . 2021b . A color/illuminance aware data augmentation and style adaptation approach to person re-identification . IEEE Access ， 9 ： 115826 – 115838 ［ DOI： 10.1109/ACCESS.2021.3100571 http://dx.doi.org/10.1109/ACCESS.2021.3100571 ］

Liu J X ， Ni B B ， Yan Y C ， Zhou P ， Cheng S and Hu JG . 2018 . Pose transferrable person re-identification . Salt Lake City， UT， USA ： IEEE ： 4099 – 4108 ［ DOI： 10.1109/CVPR.2018.00431 http://dx.doi.org/10.1109/CVPR.2018.00431 ］

Liu J ， Rahmani H ， Akhtar N and Mian A . 2019a . Learning human pose models from synthesized data for robust RGBD action recognition . International Journal of Computer Vision ， 127 （ 10 ）： 1545 – 1564 ［ DOI： 10.1007/s11263-019-01192-2 http://dx.doi.org/10.1007/s11263-019-01192-2 ］

Liu J ， Wang Q ， Fan H ， Wang Y ， Tang Y and Qu L . 2024a . Residual denoising diffusion models // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 2773 – 2783 ［ DOI： 10.1109/CVPR52733.2024.00268 http://dx.doi.org/10.1109/CVPR52733.2024.00268 ］

Liu R S ， Ma L ， Zhang J A ， Fan X and Luo Z X . 2021 . Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 10556 – 10565 ［ DOI： 10.1109/CVPR46437.2021.01042 http://dx.doi.org/10.1109/CVPR46437.2021.01042 ］

Liu R S ， Wu R D ， Hoorick B V ， Tokmakov P ， Zakharov S and Vondrick C . 2023a . Zero-1-to-3： zero-shot one image to 3d object // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 9298 – 9309 ［ DOI： 10.1109/ICCV51070.2023.00853 http://dx.doi.org/10.1109/ICCV51070.2023.00853 ］

Liu S ， Zeng Z Y ， Ren T ， Li F ， Ren T H and Li F . 2024b . Grounding DINO： Marrying DINO with grounded pre-training for open-set object detection // Proceedings of the European Conference on Computer Vision . Milano， Italy ： Springer Nature Switzerland： 38 - 55 ［ DOI： 10.1007/978-3-031-72970-6_3 http://dx.doi.org/10.1007/978-3-031-72970-6_3 ］

Liu T ， Wang G ， Hu S ， Shen L ， Ye X ， Zang Y ， Cao Z ， Li W and Liu Z . 2024c . MVSGaussian： fast generalizable gaussian splatting reconstruction from multi-view stereo // European Conference on Computer Vision . Milano， Italy ： Springer： 37 – 53 ［ DOI： 10.1007/978-3-031-72649-1_3 http://dx.doi.org/10.1007/978-3-031-72649-1_3 ］

Liu W ， Piao Z A ， Min J Y ， Luo W H ， Ma L and Gao S H . 2019b . Liquid warping GAN： A unified framework for human motion imitation， appearance transfer and novel view synthesis // Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 5903 – 5912 ［ DOI： 10.1109/ICCV.2019.00600 http://dx.doi.org/10.1109/ICCV.2019.00600 ］

Liu X T ， Ren J J ， Siarohin A ， Skorokhodov I ， Lin D ， Liu X Y ， Liu Z G ， Tulyakov S and Li Y F . 2024d . HyperHuman： Hyper-realistic human generation with latent structural diffusion // Proceedings of the International Conference on Learning Representations . Vienna， Austria ： OpenView：［ DOI： 10.48550/arXiv.2310.08579 http://dx.doi.org/10.48550/arXiv.2310.08579 ］

Liu Y ， Gao C ， Zhang Z ， Wu Y H ， Liang M X ， Tao L and Lu Y X . 2017 . A new multi-agent system to simulate the foraging behaviors of physarum . Natural Computing ， 16 ： 15 – 29 ［ DOI： 10.1007/s11047-015-9530-5 http://dx.doi.org/10.1007/s11047-015-9530-5 ］

Liu Y ， Ke Z ， Liu F ， Zhao N and Lau R W H . 2024e . Diff-Plugin： Revitalizing details for diffusion-based low-level tasks // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 4197 – 4208 ［ DOI： 10.1109/CVPR52733.2024.00402 http://dx.doi.org/10.1109/CVPR52733.2024.00402 ］

Liu Y ， Lin C ， Zeng Z J ， Long X X ， Liu L J ， Komura T and Wang W P . 2024f . Syncdreamer： generating multiview-consistent images from a single-view image // Proceedings of the Twelfth International Conference on Learning Representations . Vienna， Austria ：［ DOI： 10.48550/arXiv.2309.03453 http://dx.doi.org/10.48550/arXiv.2309.03453 ］

Liu ， X ， Gong C Y and Liu Q . 2023b . Flow straight and fast： Learning to Generate and Transfer Data with Rectified Flow // Proceedings of the International Conference on Learning Representations . Kigali， Rwanda ： Curran Associates， Inc：［ DOI： 10.48550/arXiv.2209.03003 http://dx.doi.org/10.48550/arXiv.2209.03003 ］

Long X X ， Guo Y C ， Lin C ， Liu Y ， Dou Z Y ， Liu L J ， Ma Y X ， Zhang S H ， Habermann M ， Theobalt C and Wang W P . 2024 . Wonder 3 d ： single image to 3d using cross-domain diffusion // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， WA， USA ： IEEE： 9970 – 9980 ［ DOI： 10.1109/CVPR52733.2024.00951 http://dx.doi.org/10.1109/CVPR52733.2024.00951 ］

Loper M ， Mahmood N and Black M J . 2014 . Mosh： motion and shape capture from sparse markers . ACM Transactions on Graphics ， 33 （ 6 ）： 1 – 220 ［ DOI： 10.1145/2661229.2661273 http://dx.doi.org/10.1145/2661229.2661273 ］

Lore K G ， Akintayo A and Sarkar S . 2017 . LLNet： A deep autoencoder approach to natural low-light image enhancement . Pattern Recognition ， 61 ： 650 – 662 ［ DOI： 10.1016/j.patcog.2016.06.008 http://dx.doi.org/10.1016/j.patcog.2016.06.008 ］

Lu C ， and Song Y . 2024 . Simplifying， Stabilizing and Scaling Continuous-time Consistency Models ［EB/OL］. ［ 2024-10-14 ］. https://arxiv.org/abs/2410.11081.pdf https://arxiv.org/abs/2410.11081.pdf

Lu C ， Zhou Y H ， Bao F ， Chen J ， Li C X and Zhu J . 2022 . DPM-Solver： A Fast ODE Solver for Diffusion Probabilistic Model Sampling in around 10 Steps // Proceedings of the Advances in Neural Information Processing Systems . New Orleans， Louisiana， USA ： Curran Associates， Inc： 5775 – 5787 ［ DOI： 10.5555/3600270.3600688 http://dx.doi.org/10.5555/3600270.3600688 ］

Lu T ， Yu M L ， Xu L N ， Xiangli Y B ， Wang L M ， Lin D H and Dai B . 2024 . Scaffold-gs： structured 3D gaussians for view-adaptive rendering // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 20654 – 20664 ［ DOI： 10.1109/CVPR52733.2024.01952 http://dx.doi.org/10.1109/CVPR52733.2024.01952 ］

Lugmayr A ， Danelljan M ， Gool L V and Timofte R . 2020 . SRFlow： Learning the super-resolution space with normalizing flow // Proceedings of IEEE European Conference on Computer Vision . Virtual ： Springer： 715 – 732 ［ DOI： 10.1007/978-3-030-58558-7_42 http://dx.doi.org/10.1007/978-3-030-58558-7_42 ］

LumaLabs . 2024 . Dream machine

Luo J Z ， Chen D D ， Zhang Y X ， Huang Y ， Wang L ， Shen Y J ， Zhao D ， Zhou J and Tan T N . 2023 . VideoFusion： Decomposed diffusion models for high-quality video generation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 10209 – 10218 ［ DOI： 10.1109/CVPR52729.2023.00984 http://dx.doi.org/10.1109/CVPR52729.2023.00984 ］

Luo Y ， Xu Y and Ji H . 2015 . Removing rain from a single image via discriminative sparse coding // Proceedings of the IEEE International Conference on Computer Vision . Santiago， Chile ： IEEE： 3397 – 3405 ［ DOI： 10.1109/ICCV.2015.388 http://dx.doi.org/10.1109/ICCV.2015.388 ］

Lv F and Nevatia R . 2007 . Single view human action recognition using key pose matching and Viterbi path searching // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Minneapolis， MN， USA ： IEEE： 1 – 8 ［ DOI： 10.1109/CVPR.2007.383131 http://dx.doi.org/10.1109/CVPR.2007.383131 ］

Ma F ， Jing X Y ， Zhu X ， Tang Z M and Peng Z P . 2020 . True-color and grayscale video person re-identification . IEEE Transactions on Information Forensics and Security ， 15 ： 115 – 129 ［ DOI： 10.1109/TIFS.2019.2917160 http://dx.doi.org/10.1109/TIFS.2019.2917160 ］

Ma X ， Wang Y ， Jia G ， Chen X ， Liu Z ， Li Y F ， Chen C and Qiao Y . 2024 . Latte： latent diffusion transformer for video generation ［EB/OL］. ［ 2024-01-05 ］. https://arxiv.org/abs/2401.03048.pdf https://arxiv.org/abs/2401.03048.pdf

Maeda S . 2020 . Unpaired image super-resolution using pseudo-supervision // Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 288 – 297 ［ DOI： 10.1109/CVPR42600.2020.00037 http://dx.doi.org/10.1109/CVPR42600.2020.00037 ］

Mai A ， Hedman P ， Kopanas G ， Verbin D ， Futschik D ， Xu Q ， Kuester F ， Barron JT and Zhang Y . 2024 . EVER： exact volumetric ellipsoid rendering for real-time view synthesis ［EB/OL］. ［ 2024-10-29 ］. https://arxiv.org/abs/2410.01804.pdf https://arxiv.org/abs/2410.01804.pdf

Makthal S and Ross A . 2005 . Synthesis of iris images using Markov random fields // Proceedings of the 2005 13th European Signal Processing Conference . Antalya， Turkey ： IEEE： 1 – 4

Maltoni D ， Maio D ， Jain A K and Prabhakar S . 2009 . Synthetic Fingerprint Generation // Proceedings of theHandbook of Fingerprint Recognition . London ： Springer London： 271 – 302 ［ DOI： 10.1007/978-1-84882-254-2_6 http://dx.doi.org/10.1007/978-1-84882-254-2_6 ］

Matas J ， James S and Davison A J . 2018 . Sim-to-real reinforcement learning for deformable object manipulation ［EB/OL］. ［ 2018-10-08 ］. https://arxiv.org/abs/1806.07851.pdf https://arxiv.org/abs/1806.07851.pdf

McLaughlin N ， Del Rincon J M and Miller P . 2015 . Data-augmentation for reducing dataset bias in person re-identification // Proceedings of the 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance . Karlsruhe， Germany ： IEEE： 1 – 6 ［ DOI： 10.1109/AVSS.2015.7301739 http://dx.doi.org/10.1109/AVSS.2015.7301739 ］

Menapace W ， Siarohin A ， Skorokhodov I ， Deyneka E ， Chen T S ， Kag A ， Fang Y ， Stoliar A ， Ricci E ， Ren J and Tulyakov S . 2024 . Snap video： scaled spatiotemporal transformers for text-to-video synthesis // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 7038 – 7048 ［ DOI： 10.1109/CVPR52733.2024.00672 http://dx.doi.org/10.1109/CVPR52733.2024.00672 ］

Mildenhall B ， Srinivasan P P ， Tancik M ， Barron J T ， Ramamoorthi R and Ng R . 2021 . NeRF： representing scenes as neural radiance fields for view synthesis. Commun . ACM ， 65 （ 1 ）： 99 – 106 ［ DOI： 10.1145/3503250 http://dx.doi.org/10.1145/3503250 ］

Minaee S and Abdolrashidi A . 2018a . Iris-GAN： Learning to generate realistic iris images using convolutional GAN ［EB/OL］. ［ 2018-12-25 ］. https://arxiv.org/abs/1812.04822.pdf https://arxiv.org/abs/1812.04822.pdf

Minaee S and Abdolrashidi A . 2018b . Finger-GAN： generating realistic fingerprint images using connectivity imposed GAN ［EB/OL］. ［ 2018-12-25 ］. https://arxiv.org/abs/1812.10482.pdf https://arxiv.org/abs/1812.10482.pdf

Minaee S ， Minaei M and Abdolrashidi A . 2020 . Palm-GAN： generating realistic palmprint images using total-variation regularized GAN ［EB/OL］. ［ 2020-03-21 ］. https://arxiv.org/abs/2003.10834.pdf https://arxiv.org/abs/2003.10834.pdf

MiniMax . 2024 . Hailuo ai

Mistry V ， Engelsma J J and Jain A K . 2020 . Fingerprint synthesis： search with 100 million prints // Proceedings of the 2020 IEEE International Joint Conference on Biometrics . Houston， TX， USA ： IEEE： 1 – 10 ［ DOI： 10.1109/IJCB48548.2020.9304885 http://dx.doi.org/10.1109/IJCB48548.2020.9304885 ］

Mittal G ， Marwah T and Balasubramanian V N . 2017 . Sync-draw： Automatic video generation using deep recurrent attentive architectures // Proceedings of the 25th ACM International Conference on Multimedia . California， USA ： SIGMM： 1096 – 1104 ［ DOI： 10.1145/3123266.3123309 http://dx.doi.org/10.1145/3123266.3123309 ］

Mo K ， Zhu S ， Chang A X ， Yi L ， Tripathi S ， Guibas L J and Su H . 2019 . PartNet： a large-scale benchmark for fine-grained and hierarchical part-level 3D object understanding // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 909 – 918 ［ DOI： 10.1109/CVPR.2019.00100 http://dx.doi.org/10.1109/CVPR.2019.00100 ］

Montulet R and Briassouli A . 2021 . Densely annotated photorealistic virtual dataset generation for abnormal event detection // Proceedings of the 25th International Conference on Pattern Recognition Workshops . Paris， France ： Springer： 5 – 19 ［ DOI： 10.1007/978-3-030-68799-1_1 http://dx.doi.org/10.1007/978-3-030-68799-1_1 ］

Mou C ， Wang X T ， Xie L B ， Wu Y C ， Zhang J ， Qi Z A and Shan Y . 2024 . T 2 I-Adapter： Learning adapters to dig out more controllable ability for text-to-image diffusion models // Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver， Canada ： AAAI ： 4296 - 4304 ［ DOI： 10.1609/aaai.v38i5.28226 http://dx.doi.org/10.1609/aaai.v38i5.28226 ］

Müller N ， Siddiqui Y ， Porzi L ， Bulo SR ， Kontschieder P and Nießner M . 2023 . Diffrf： rendering-guided 3d radiance field diffusion // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 4328 – 4338 ［ DO I： 10.1109/CVPR52729.2023.00421 http://dx.doi.org/10.1109/CVPR52729.2023.00421 ］

Müller T ， Evans A ， Schied C and Keller A . 2022 . Instant neural graphics primitives with a multiresolution hash encoding . ACM Transactions on Graphics ， 41 （ 4 ）： 102 ： 1 - 102 ： 15 ［ DOI： 10.1145/3528223.3530127 http://dx.doi.org/10.1145/3528223.3530127 ］

Navaneet K L ， Pourahmadi Meibodi K ， Abbasi Koohpayegani S and Pirsiavash H . 2024 . CompGS： smaller and faster gaussian splatting with vector quantization // European Conference on Computer Vision . Milano， Italy ： Springer： 330 – 349 ［ DOI： 10.1007/978-3-031-73411-3_19 http://dx.doi.org/10.1007/978-3-031-73411-3_19 ］

Ng A Y . 2011 . Sparse autoencoder // CS294A Lecture Notes .

Nguyen T D ， Le T ， Vu H and Phung D . 2017 . Dual discriminator generative adversarial nets // Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook， NY， USA ： Curran Associates Inc： 2667 – 2677 ［ DOI： 10.5555/3294996.3295027 http://dx.doi.org/10.5555/3294996.3295027 ］

Nicodemou V C ， Oikonomidis I and Argyros A . 2023 . RV-VAE： Integrating Random Variable Algebra into Variational Autoencoders // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops . Paris， France ： IEEE： 196 - 205 ［ DOI： 10.1109/ICCVW60793.2023.00027 http://dx.doi.org/10.1109/ICCVW60793.2023.00027 ］

Niedermayr S ， Stumpfegger J and Westermann R . 2024 . Compressed 3D gaussian splatting for accelerated novel view synthesis // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 10349 – 10358 ［ DOI： 10.1109/CVPR52733.2024.00985 http://dx.doi.org/10.1109/CVPR52733.2024.00985 ］

Niemeyer M and Geiger A . 2021 . Giraffe： Representing scenes as compositional generative neural feature fields // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 11448 – 11459 ［ DOI： 10.1109/CVPR46437.2021.01129 http://dx.doi.org/10.1109/CVPR46437.2021.01129 ］

Niu K ， Yu H Y ， Qian X L ， Fu T ， Li B and Xue X Y . 2024 . Synthesizing efficient data with diffusion models for person re-identification pre-training ［EB/OL］. ［ 2024-06-10 ］. http://arxiv.org/abs/2406.06045.pdf http://arxiv.org/abs/2406.06045.pdf

Noguchi A ， Sun X ， Lin S and Harada T . 2022 . Unsupervised learning of efficient geometry-aware neural articulated representations // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer ： 597 - 614 ［ DOI： 10.1007/978-3-031-19790-1_36 http://dx.doi.org/10.1007/978-3-031-19790-1_36 ］

Oord A V D ， Kalchbrenner N and Kavukcuoglu K . 2016b . Pixel recurrent neural networks // Proceedings of the International Conference on Machine Learning . New York， USA ： ACM： 1747 – 1756 ［ DOI： 10.5555/3045390.3045575 http://dx.doi.org/10.5555/3045390.3045575 ］

Oord A V D ， Kalchbrenner N ， Espeholt L ， Kavukcuoglu K ， Vinyals O and Graves A . 2016a . Conditional image generation with PixelCNN decoders // Proceedings of Advances in Neural Information Processing Systems . Monterey， California， USA ： Curran Associates， Inc： 4797 – 4805 ［ DOI： 10.5555/3157382.3157633 http://dx.doi.org/10.5555/3157382.3157633 ］

Oord A V D ， Vinyals O and Kavukcuoglu K . 2017 . Neural Discrete Representation Learning // Proceedings of the International Conference on Neural Information Processing Systems . Long Beach， California， USA ： Curran Associates， Inc： 6309 – 6318 ［ DOI： 10.5555/3295222.3295378 http://dx.doi.org/10.5555/3295222.3295378 ］

OpenAI . 2024 . Video generation models as world simulators

Ostrek M ， Sanyal S ， O’Sullivan C ， Black M J and Thies J . 2023 . Environment-specific people . ［2023-11-22］ . https：//arxiv.org/abs/2312.14579.pdf https://arxiv.org/abs/2312.14579.pdf

Ou W F ， Po L M ， Zhou C ， Xian P F and Xiong J J . 2022 . GAN-based inter-class sample generation for contrastive learning of vein image representations . IEEE Transactions on Biometrics， Behavior， and Identity Science ， 4 （ 2 ）： 249 – 262 ［ DOI： 10.1109/TBIOM.2022.3152345 http://dx.doi.org/10.1109/TBIOM.2022.3152345 ］

Özdenizci O and Legenstein R . 2023 . Restoring vision in adverse weather conditions with patch-based denoising diffusion models . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 45 （ 8 ）： 10346 – 10357 ［ DOI： 10.1109/TPAMI.2023.3238179 http://dx.doi.org/10.1109/TPAMI.2023.3238179 ］

Pan X G ， Tewari A ， Leimkühler T ， Liu L J ， Meka A ， and Theobalt C . 2023 . Drag Your GAN： Interactive Point-based Manipulation on the Generative Image Manifold // Proceedings of ACM SIGGRAPH Conference . Los Angeles， CA ： ACM： 1 - 11 ［ DOI： 10.1145/3588432.3591500 http://dx.doi.org/10.1145/3588432.3591500 ］

Panev S ， Kim E ， Namburu S A S ， Nikolova D ， Melo C D ， Torre F D L and Hodgins J . 2024 . Exploring the impact of rendering method and motion quality on model performance when using multi-view synthetic data for action recognition // Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， HI， USA ： IEEE： 4592 – 4602 ［ DOI： 10.1109/WACV57701.2024.00453 http://dx.doi.org/10.1109/WACV57701.2024.00453 ］

Pang Z Q ， Guo J F ， Sun W B ， Xiao Y B and Yu M . 2022 . Cross-domain person re-identification by hybrid supervised and unsupervised learning . Applied Intelligence ， 52 ： 2987 – 3001 ［ DOI： 10.1007/s10489-021-02551-8 http://dx.doi.org/10.1007/s10489-021-02551-8 ］

Papamakarios G ， Pavlakou T ， and Murray I . 2017 . Masked Autoregressive Flow for Density Estimations // Proceedings of the Advances in Neural Information Processing Systems . Long Beach， California， USA ： Curran Associates， Inc： 2338 – 2347 ［ DOI： 10.48550/1705.07057 http://dx.doi.org/10.48550/1705.07057 ］

Patel P ， Huang C H P ， Tesch J ， Hoffmann D T ， Tripathi S and Black M J . 2021 . AGORA： avatars in geography optimized for regression analysis // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 13463 – 13473 ［ DOI： 10.1109/CVPR46437.2021.01326 http://dx.doi.org/10.1109/CVPR46437.2021.01326 ］

Peng D ， Hu P ， Ke Q N and Liu J . 2023 . Diffusion-based image translation with label guidance for domain adaptive semantic segmentation // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 808 – 820 ［ DOI： 10.1109/ICCV51070.2023.00081 http://dx.doi.org/10.1109/ICCV51070.2023.00081 ］

Perarnau G ， Weijer J V D ， Raducanu B and Alvarez J M . 2016 . Invertible conditional GANs for image editing // Proceedings of the Annual Conference on Neural Information Processing Systems Workshop . San Juan， California， USA ： Curran Associates， Inc：［ DOI： 10.48550/arXiv.1611.06355 http://dx.doi.org/10.48550/arXiv.1611.06355 ］

Perlin K . 1985 . An image synthesizer . Computer Graphics ， 19 （ 3 ）： 287 – 296 ［ DOI： 10.1145/325165.325247 http://dx.doi.org/10.1145/325165.325247 ］

Petrovich M ， Black M J and Varol G . 2022 . TEMOS： Generating diverse human motions from textual descriptions // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 480 - 497 ［ DOI： 10.1007/978-3-031-20047-2_28 http://dx.doi.org/10.1007/978-3-031-20047-2_28 ］

PixVerse . 2024 . Pixverse

Podell D . 2023 . SDXL： Improving Latent Diffusion Models for High-resolution Image Synthesis ［EB/OL］. ［ 2023-07-04 ］. https://arxiv.org/abs/2307.01952.pdf https://arxiv.org/abs/2307.01952.pdf

Polyak A ， Zohar A ， Brown A ， Tjandra A ， Sinha A ， Lee A ， Vyas A ， Shi B ， Ma C-Y ， Chuang C-Y ， Yan D ， Choudhary D ， Wang D ， Sethi G ， Pang G ， Ma H ， Misra I ， Hou J ， Wang J ， Jagadeesh K ， Li K ， Zhang L ， Singh M ， Williamson M ， Le M ， Yu M ， Singh MK ， Zhang P ， Vajda P ， Duval Q ， Girdhar R ， Sumbaly R ， Rambhatla SS ， Tsai S ， Azadi S ， Datta S ， Chen S ， Bell S ， Ramaswamy S ， Sheynin S ， Bhattacharya S ， Motwani S ， Xu T ， Li T ， Hou T ， Hsu W N ， Yin X ， Dai X ， Taigman Y ， Luo Y ， Liu Y C ， Wu Y C ， Zhao Y ， Kirstain Y ， He Z ， He Z ， Pumarola A ， Thabet A ， Sanakoyeu A ， Mallya A ， Guo B ， Araya B ， Kerr B ， Wood C ， Liu C ， Peng C ， Vengertsev D ， Schonfeld E ， Blanchard E ， Juefei-Xu F ， Nord F ， Liang J ， Hoffman J ， Kohler J ， Fire K ， Sivakumar K ， Chen L ， Yu L ， Gao L ， Georgopoulos M ， Moritz R ， Sampson S K ， Li S ， Parmeggiani S ， Fine S ， Fowler T ， Petrovic V and Du Y . 2024 . Movie gen： a cast of media foundation models ［EB/OL］. ［ 2024-10-17 ］. https://arxiv.org/abs/2410.13720.pdf https://arxiv.org/abs/2410.13720.pdf

Poole B ， Jain A ， Barron J T and Mildenhall B . 2022 . DreamFusion： text-to-3D using 2D diffusion ［EB/OL］. ［ 2022-7-20 ］. https://arxiv.org/pdf/2209.14988.pdf https://arxiv.org/pdf/2209.14988.pdf

Poole B ， Jain A ， Barron J T and Mildenhall B . 2023 . Dreamfusion： Text-to-3d using 2d diffusion // Proceedings of the 2023 International Conference on Learning Representations . Kigali， Rwanda . ［ DOI： 10.48550/arXiv.2209.14988 http://dx.doi.org/10.48550/arXiv.2209.14988 ］

Postels J ， Danelljan M ， Van Gool L ， Tombari F . 2022 . ManiFlow： Implicitly Representing Manifolds with Normalizing Flows ［EB/OL］. ［ 2022-08-18 ］. https://arxiv.org/abs/2208.08932.pdf https://arxiv.org/abs/2208.08932.pdf

Priesnitz J ， Rathgeb C ， Buchmann N and Busch C . 2022 . SynCoLFinGer： synthetic contactless fingerprint generator. Pattern Recogn . Lett ， 157 ： 127 – 134 ［ DOI： 10.1016/j.patrec.2022.04.003 http://dx.doi.org/10.1016/j.patrec.2022.04.003 ］

Puig X ， Ra K ， Boben M ， Li J ， Wang T W ， Fidler S and Torralba A . 2018 . VirtualHome： Simulating household activities via programs // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 8494 – 8502 ［ DOI： 10.1109/CVPR.2018.00886 http://dx.doi.org/10.1109/CVPR.2018.00886 ］

Puig X ， Undersander E ， Szot A ， Cote M D ， Yang T Y ， Partsey R ， Desai R ， Clegg A W ， Hlavac M ， Min S Y ， Vondruš V ， Gervet T ， Berges V P ， Turner J M ， Maksymets O ， Kira Z ， Kalakrishnan M ， Malik J ， Chaplot D S ， Jain U ， Batra D ， Rai A and Mottaghi R . 2023 . Habitat 3 . 0 ： a co-habitat for humans， avatars and robots［EB/OL］. ［ 2023-10-19 ］. https://arxiv.org/abs/2310.13724.pdf https://arxiv.org/abs/2310.13724.pdf

Qian G C ， Mai J J ， Hamdi A ， Ren J ， Siarohin A ， Li B ， Lee H Y ， Skorokhodov I ， Wonka P and Tulyakov S . 2024 . Magic 123 ： One image to high-quality 3d object generation using both 2d and 3d diffusion priors // Proceedings of the 2024 International Conference on Learning Representations. Vienna， Austria . ［ DOI： 10.48550/arXiv.2306.17843 http://dx.doi.org/10.48550/arXiv.2306.17843 ］

Qian R ， Tan R T ， Yang W ， Su J and Liu J . 2018a . Attentive generative adversarial network for raindrop removal from a single image // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 2482 – 2491 ［ DOI： 10.1109/CVPR.2018.00263 http://dx.doi.org/10.1109/CVPR.2018.00263 ］

Qian X L ， Fu Y W ， Xiang T ， Wang W X ， Qiu J ， Wu Y ， Jiang Y G and Xue X Y . 2018b . Pose-normalized image generation for person re-identification // Proceedings of the 15th European Conference Computer Vision . Munich， Germany ： Springer International Publishing： 661 – 678 ［ DOI： 10.1007/978-3-030-01240-3_40 http://dx.doi.org/10.1007/978-3-030-01240-3_40 ］

Qiu L T ， Chen G Y ， Gu X D ， Zuo Q ， Xu M T ， Wu Y S ， Yuan W H ， Dong Z L ， Bo L F and Han X G . 2024 . Richdreamer： a generalizable normal-depth diffusion model for detail richness in text-to-3d // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 9914 – 9925 ［ DOI： 10.1109/CVPR52733.2024.00946 http://dx.doi.org/10.1109/CVPR52733.2024.00946 ］

Qiu W C ， Zhong F W ， Zhang Y ， Qiao S Y ， Xiao Z H ， Kim T S and Wang Y Z . 2017 . UnrealCV： virtual worlds for computer vision // Proceedings of the 25th ACM International Conference on Multimedia . New York， NY， USA ： Association for Computing Machinery： 1221 – 1224 ［ DOI： 10.1145/3123266.3129396 http://dx.doi.org/10.1145/3123266.3129396 ］

Qu L H ， Liu S L ， Wang M N and Song Z J . 2022 . Transmef： A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning // Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver，Canada ： AAAI： 2126 – 2134 ［ DOI： 10.1609/aaai.v36i2.20109 http://dx.doi.org/10.1609/aaai.v36i2.20109 ］

Radford A ， Kim J W ， Hallacy C ， Ramesh A ， Goh G ， Agarwal S ， Sastry G ， Askell A ， Mishkin P ， Clark J ， Krueger G and Sutskever I . 2021 . Learning transferable visual models from natural language supervision // Proceedings of the 38th International Conference on Machine Learning . Virtual ： PMLR： 8748 – 8763 ［ DOI： 10.48550/arXiv.2103.00020 http://dx.doi.org/10.48550/arXiv.2103.00020 ］

Radford A ， Metz L and Chintala S . 2015 . Unsupervised representation learning with deep convolutional generative adversarial networks // Proceedings of the International Conference on Learning Representations . San Diego， California， USA ： DBLP：［ DOI： 10.48550/arXiv.1511.06434 http://dx.doi.org/10.48550/arXiv.1511.06434 ］

Radl L ， Steiner M ， Parger M ， Weinrauch A ， Kerbl B and Steinberger M . 2024 . StopThePop： sorted gaussian splatting for view-consistent real-time rendering . ACM Transactions on Graphics ， 43 （ 4 ）： 1 – 17 ［ DOI： 10.1145/3658187 http://dx.doi.org/10.1145/3658187 ］

Rahmani H and Mian A . 2015 . Learning a non-linear knowledge transfer model for cross-view action recognition // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Boston， MA， USA ： IEEE： 2458 – 2466 ［ DOI： 10.1109/CVPR.2015.7298860 http://dx.doi.org/10.1109/CVPR.2015.7298860 ］

Rahmani H and Mian A . 2016 . 3D action recognition from novel viewpoints // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 1506 – 1515 ［ DOI： 10.1109/CVPR.2016.167 http://dx.doi.org/10.1109/CVPR.2016.167 ］

Rahmani H ， Mian A and Shah M . 2017 . Learning a deep model for human action recognition from novel viewpoints . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 40 （ 3 ）： 667 – 681 ［ DOI： 10.1109/TPAMI.2017.2691768 http://dx.doi.org/10.1109/TPAMI.2017.2691768 ］

Raistrick A ， Lipson L ， Ma Z Y ， Mei L J ， Wang M Z ， Zuo Y M ， Kayan K ， Wen H Y ， Han B N ， Wang Y H ， Newell A ， Law H ， Goyal A ， Yang K and Deng J . 2023 . Infinite photorealistic worlds using procedural generation // 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 12630 – 12641 ［ DOI： 10.1109/CVPR52729.2023.01215 http://dx.doi.org/10.1109/CVPR52729.2023.01215 ］

Raistrick A ， Mei L J ， Kayan K ， Yan D ， Zuo Y M ， Han B N ， Wen H Y ， Parakh M ， Alexandropoulos S ， Lipson L ， Ma Z Y and Deng J . 2024 . Infinigen indoors： photorealistic indoor scenes using procedural generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 21783 – 21794 ［ DOI： 10.1109/CVPR52733.2024.02058 http://dx.doi.org/10.1109/CVPR52733.2024.02058 ］

Ram Prabhakar K ， Sai Srikar V and Venkatesh Babu R . 2017 . Deepfuse： A deep unsupervised approach for exposure fusion with extreme exposure image pairs // Proceedings of the IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 4714 – 4722 ［ DOI： DOI：10.1109/ICCV.2017.505 http://dx.doi.org/DOI：10.1109/ICCV.2017.505 ］

Ramesh A ， Dhariwal P ， Nichol A ， Chu C and Chen M . 2022 . Hierarchical Text-Conditional Image Generation with CLIP Latents ［EB/OL］. ［ 2022-04-13 ］. https://arxiv.org/abs/2204.06125.pdf https://arxiv.org/abs/2204.06125.pdf

Ramesh A ， Pavlov M ， Goh G ， Gray S ， Voss C ， Radford A ， Chen M and Sutskever I . 2021 . Zero-shot text-to-image generation // Proceedings of the International Conference on Machine Learning . Virtual ： PMLR： 8821 - 8831 ［ DOI： 10.48550/arXiv.2102.12092 http://dx.doi.org/10.48550/arXiv.2102.12092 ］

Reimers N and Gurevych I . 2019 . Sentence-BERT： sentence embeddings using siamese BERT-networks // Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing . Hong Kong， China ： Association for Computational Linguistics： 3982 – 3992 ［ DOI： 10.18653/v1/D19-1410 http://dx.doi.org/10.18653/v1/D19-1410 ］

Rempe D ， Philion J ， Guibas LJ ， Fidler S and Litany O . 2022 . Generating useful accident-prone driving scenarios via a learned traffic prior // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 17284 – 17294 ［ DOI： 10.1109/CVPR52688.2022.01679 http://dx.doi.org/10.1109/CVPR52688.2022.01679 ］

Rezende D J and Mohamed S . 2015 . Variational inference with normalizing flows // Proceedings of the International Conference on Machine Learning . Paris， France ： ACM： 1530 – 1538 ［ DOI： 10.5555/3045118.3045281 http://dx.doi.org/10.5555/3045118.3045281 ］

Riazi M S ， Chavoshian S M and Koushanfar F . 2020 . SynFi： automatic synthetic fingerprint generation ［EB/OL］. ［ 2020-02-16 ］. https://arxiv.org/abs/2002.08900.pdf https://arxiv.org/abs/2002.08900.pdf

Richardson E ， Metzer G ， Alaluf Y ， Giryes R and Cohen-Or D . 2023 . TEXTure： text-guided texturing of 3D shapes . ACM Transactions on Graphics ， 42 （ 4 ）： 1 - 11 ［ DOI： 10.1145/3592410 http://dx.doi.org/10.1145/3592410 ］

Richter S R ， Vineet V ， Roth S and Koltun V . 2016 . Playing for data： ground truth from computer games // Proceedings of the European Conference on Computer Vision . Amsterdam， Netherlands ： Springer International Publishing： 102 – 118 ［ DOI： 10.1007/978-3-319-46475-6_7 http://dx.doi.org/10.1007/978-3-319-46475-6_7 ］

Roberts M ， Ramapuram J ， Ranjan A ， Kumar A ， Bautista M A ， Paczan N ， Webb R and Susskind J M . 2021 . Hypersim： a photorealistic synthetic dataset for holistic indoor scene understanding // Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision . Montreal， QC， Canada ： IEEE： 10892 – 10902 ［ DOI： 10.1109/ICCV48922.2021.01073 http://dx.doi.org/10.1109/ICCV48922.2021.01073 ］

Rombach R ， Blattmann A ， Lorenz D ， Esser P and Ommer B . 2022 . High-resolution image synthesis with latent diffusion models // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 10674 – 10685 ［ DOI： 10.1109/CVPR52688.2022.01042 http://dx.doi.org/10.1109/CVPR52688.2022.01042 ］

Ronneberger O ， Fischer P and Brox T . 2015 . U-net： convolutional networks for biomedical image segmentation // Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention . Munich， Germany ： Springer： 234 – 241 ［ DOI： 10.1007/978-3-319-24574-4_28 http://dx.doi.org/10.1007/978-3-319-24574-4_28 ］

Ros G ， Sellart L ， Materzynska J ， Vazquez D and Lopez A M . 2016 . The SYNTHIA dataset： a large collection of synthetic images for semantic segmentation of urban scenes // 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 3234 – 3243 ［ DOI： 10.1109/CVPR.2016.352 http://dx.doi.org/10.1109/CVPR.2016.352 ］

Runions A ， Fuhrer M ， Lane B ， Federl P ， Rolland-Lagan A G and Prusinkiewicz P . 2005 . Modeling and visualization of leaf venation patterns. ACM Trans . Graph ， 24 （ 3 ）： 702 – 711 ［ DOI： 10.1145/1073204.1073251 http://dx.doi.org/10.1145/1073204.1073251 ］

Runway . 2024 . Gen-3

Saharia C ， Chan W ， Saxena S ， Li L L ， Whang J ， Denton E ， Ghasemipour S K S ， Ayan B K ， Mahdavi S S ， Lopes R G ， Salimans T ， Hoy J ， Fleet D J and Norouzi M . 2022a . Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding // Proceedings of Advances in Neural Information Processing Systems . New Orleans， Louisiana， USA ： Curran Associates， Inc： 36479 – 36494 ［ DOI： 10.5555/3600270.3602913 http://dx.doi.org/10.5555/3600270.3602913 ］

Saharia C ， Ho J ， Chan W ， Salimans T ， Fleet D J and Norouzi M . 2022b . Image Super-Resolution via Iterative Refinement . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 44 （ 10 ）： 4727 – 4740 ［ DOI： 10.1109/TPAMI.2022.3204461 http://dx.doi.org/10.1109/TPAMI.2022.3204461 ］

Salazar E ， Hernández-García R ， Barrientos R J ， Vilches K ， Mora M and Vásquez A . 2021a . Automatic generation of synthetic palm vein images： a nature-based approach // Proceedings of the 11th International Conference of Pattern Recognition Systems . Online Conference ： IEEE： 38 – 43 ［ DOI： 10.1049/icp.2021.1452 http://dx.doi.org/10.1049/icp.2021.1452 ］

Salazar E ， Hernández-García R ， Barrientos R J ， Vilches K ， Mora M and Vásquez A . 2021b . Generating style-based palm vein synthetic images for the creation of large-scale datasets // Proceedings of the 11th International Conference of Pattern Recognition Systems . Online Conference ： IEEE： 182 – 187 ［ DOI： 10.1049/icp.2021.1451 http://dx.doi.org/10.1049/icp.2021.1451 ］

Salazar-Jurado E H ， Hernández-García R ， Vilches-Ponce K ， Barrientos R J ， Mora M and Jaswal G . 2023 . Towards the generation of synthetic images of palm vein patterns： a review. Inf . Fusion ， 89 ： 66 – 90 ［ DOI： 10.1016/j.inffus.2022.08.008 http://dx.doi.org/10.1016/j.inffus.2022.08.008 ］

Salem Hussin S H and Yildirim R . 2021 . StyleGAN-LSRO method for person re-identification . IEEE Access ， 9 ： 13857 – 13869 ［ DOI： 10.1109/ACCESS.2021.3051723 http://dx.doi.org/10.1109/ACCESS.2021.3051723 ］

Sams A ， Shomee H H and Rahman S M M . 2022 . HQ-finGAN： high-quality synthetic fingerprint generation using GANs . Circuits， Systems， and Signal Processing ， 41 （ 11 ）： 6354 – 6369 ［ DOI： 10.1007/s00034-022-02089-1 http://dx.doi.org/10.1007/s00034-022-02089-1 ］

Sarıyıldız M B ， Alahari K ， Larlus D and Kalantidis Y . 2023 . Fake it till you make it： learning transferable representations from synthetic ImageNet clones // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 8011 – 8021 ［ DOI： 10.1109/CVPR52729.2023.00774 http://dx.doi.org/10.1109/CVPR52729.2023.00774 ］

Sauer A ， Karras T ， Laine S ， Geiger A ， and Aila T . 2023 . Stylegan-t： Unlocking the power of gans for fast large-scale text-to-image synthesis // Proceedings of International conference on machine learning . Honolulu， USA ： PMLR： 30105 - 30118 ［ DOI： 10.5555/3618408.3619658 http://dx.doi.org/10.5555/3618408.3619658 ］

Sauer A ， Schwarz K ， and Geiger A . 2022 . Stylegan-xl： Scaling stylegan to large diverse datasets // Proceedings of ACM SIGGRAPH 2022 conference proceedings . Vancouver ： ACM： 1 - 10 ［ DOI： 10.1145/3528233.3530738 http://dx.doi.org/10.1145/3528233.3530738 ］

Savva M ， Malik J ， Parikh D ， Batra D ， Kadian A ， Maksymets O ， Zhao Y ， Wijmans E ， Jain B ， Straub J ， Liu J and Koltun V . 2019 . Habitat： a platform for embodied AI research // Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul， Korea （South）： IEEE： 9338 – 9346 ［ DOI： 10.1109/ICCV.2019.00948 http://dx.doi.org/10.1109/ICCV.2019.00948 ］

Schreiner W and Buxbaum P F . 1993 . Computer-optimization of vascular trees . IEEE Transactions on Biomedical Engineering ， 40 （ 5 ）： 482 – 491 ［ DOI： 10.1109/10.243413 http://dx.doi.org/10.1109/10.243413 ］

Schröder G ， Senst T ， Bochinski E and Sikora T . 2018 . Optical flow dataset and benchmark for visual crowd analysis // Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance . Auckland， New Zealand ： IEEE： 1 – 6 ［ DOI： 10.1109/AVSS.2018.8639140 http://dx.doi.org/10.1109/AVSS.2018.8639140 ］

Schuhmann C ， Beaumont R ， Vencu R ， Gordon C ， Wightman R ， Cherti M ， Coombes T ， Katta A ， Mullis C ， Wortsman M ， and others . 2022 . Laion-5b： an open large-scale dataset for training next generation image-text models // Proceedings of the 2022 Advances in Neural Information Processing Systems . New Orleans， LA， USA ： Curran Associates Inc： 25278 – 25294 ［ DOI： 10.48550/arXiv.2210.08402 http://dx.doi.org/10.48550/arXiv.2210.08402 ］

Schwarz K ， Liao Y Y ， Niemeyer M and Geiger A . 2020 . Graf： generative radiance fields for 3d-aware image synthesis // Proceedings of the 2020 Advances in Neural Information Processing Systems . Vancouver， BC， Canada ： Curran Associates Inc： 20154 – 20166 ［ DOI： 10.48550/arXiv.2007.02442 http://dx.doi.org/10.48550/arXiv.2007.02442 ］

Shah S ， Dey D ， Lovett C and Kapoor A . 2018 . Airsim： high-fidelity visual and physical simulation for autonomous vehicles // Proceedings of the Field and Service Robotics： Results of the 11th International Conference . Zurich， Switzerland ： Springer International Publishing： 621 – 635 ［ DOI： 10.1007/978-3-319-67361-5_40 http://dx.doi.org/10.1007/978-3-319-67361-5_40 ］

Shang S ， Zhao C L ， and Zhang R X. ， Jia ， W ， 2025 . PVTree： Realistic and Controllable Palm Vein Generation for Recognition Tasks . Association for the Advancement of Artificial Intelligence（AAAI）.（Accept 2025 ）

Shang Y ， Lin Y M ， Zheng Y ， Fan H Y ， Ding J T ， Feng J ， Chen J S ， Tian L and Li Y . 2024 . UrbanWorld： an urban world model for 3D city generation ［EB/OL］. ［ 2024-10-22 ］. https://arxiv.org/abs/2407.11965.pdf https://arxiv.org/abs/2407.11965.pdf

Shen L ， Jin J L ， Zhang R X ， Li H E ， Zhao K ， Zhang Y ， Zhang J Y ， Ding S ， Zhao Y and Jia W . 2023 . RPG-palm： realistic pseudo-data generation for palmprint recognition // 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 19548 – 19559 ［ DOI： 10.1109/ICCV51070.2023.01796 http://dx.doi.org/10.1109/ICCV51070.2023.01796 ］

Shen Y J ， Luo P ， Yan J J ， Wang X G and Tang X O . 2018a . FaceID-GAN： learning a symmetry three-player GAN for identity-preserving face synthesis // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 821 – 830 ［ DOI： 10.1109/CVPR.2018.00092 http://dx.doi.org/10.1109/CVPR.2018.00092 ］

Shen Y J ， Zhou BL ， Luo P and Tang X O . 2018b . FaceFeat-GAN： a two-stage approach for identity-preserving face synthesis ［EB/OL］. ［ 2018-12-04 ］. https://arxiv.org/abs/1812.01288.pdf https://arxiv.org/abs/1812.01288.pdf

Sherlock B G and Monro D M . 1993 . A model for interpreting fingerprint topology . Pattern Recognition ， 26 （ 7 ）： 1047 – 1055 ［ DOI： 10.1016/0031-3203（93）90006-I http://dx.doi.org/10.1016/0031-3203（93）90006-I ］

Shi Y C ， Wang P ， Ye J L ， Mai L ， Li K J and Yang X . 2024 . Mvdream： multi-view diffusion for 3d generation // Proceedings of the 2024 International Conference on Learning Representations . Vienna， Austria . ［ DOI： 10.48550/arXiv.2308.16512 http://dx.doi.org/10.48550/arXiv.2308.16512 ］

Shinn N ， Labash B and Gopinath A . 2023 . Reflexion： an autonomous agent with dynamic memory and self-reflection ［EB/OL］. ［ 2023-10-10 ］. https://arxiv.org/pdf/2303.11366.pdf https://arxiv.org/pdf/2303.11366.pdf

Shridhar M ， Manuelli L and Fox D . 2022 . CLIPort： what and where pathways for robotic manipulation // Proceedings of 2022 the Conference on Robot Learning . London， UK ： PMLR： 894 – 906 ［ DOI： 10.48550/arXiv.2109.12098 http://dx.doi.org/10.48550/arXiv.2109.12098 ］

Singer U ， Polyak A ， Hayes T ， Yin X ， An J ， Zhang S Y ， Hu Q Y ， Yang H ， Ashual O ， Gafni O ， Parikh D ， Gupta S and Taigman Y . 2022 . Make-a-video： text-to-video generation without text-video data ［EB/OL］. ［ 2022-09-29 ］. https://arxiv.org/abs/2209.14792.pdf https://arxiv.org/abs/2209.14792.pdf

Sitzmann V ， Martel J ， Bergman A ， Lindell D and Wetzstein G . 2020 . Implicit neural representations with periodic activation functions // Proceedings of the 2020 Advances in Neural Information Processing Systems . Vancouver， BC， Canada ： Curran Associates Inc： 7462 – 7473 ［ DOI： 10.48550/arXiv.2006.09661 http://dx.doi.org/10.48550/arXiv.2006.09661 ］

Sohn K ， Lee H and Yan X C . 2015 . Learning Structured Output Representation using Deep Conditional Generative Models // Proceedings of the International Conference on Neural Information Processing Systems . Montreal， Canada ： Curran Associates， Inc： 3483 – 3491 ［ DOI： 10.5555/2969442.2969628 http://dx.doi.org/10.5555/2969442.2969628 ］

Solera-Rico A ， Sanmiguel Vila C ， Gómez-López M ， Wang Y ， Almashjary A ， Dawson S T M and Vinuesa R . 2024 . β-Variational autoencoders and transformers for reduced-order modelling of fluid flows . Nature Communications ， 15 （ 1 ）： 1361 ［ DOI： 10.1038/s41467-024-45578-4 http://dx.doi.org/10.1038/s41467-024-45578-4 ］

Song J ， Meng C and Ermon S . 2021a . Denoising Diffusion Implicit Models // Proceedings of the International Conference on Learning Representations . Virtual ： ICLR Press：［ DOI： 10.48550/arXiv.2010.02502 http://dx.doi.org/10.48550/arXiv.2010.02502 ］

Song S R ， Yu F ， Zeng A ， Chang A X ， Savva M and Funkhouser T . 2017 . Semantic scene completion from a single depth image // Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 1746 – 1754 ［ DOI： 10.1109/CVPR.2017.27 http://dx.doi.org/10.1109/CVPR.2017.27 ］

Song X W ， Zheng J ， Yuan S R ， Gao H A ， Zhao J W ， He X ， Gu W H and Zhao H . 2024 . SA-GS： scale-adaptive gaussian splatting for training-free anti-aliasing ［EB/OL］. ［ 2024-03-28 ］. https://arxiv.org/abs/2403.19615.pdf https://arxiv.org/abs/2403.19615.pdf

Song Y and Ermon S . 2020 . Generative modeling by estimating gradients of the data distribution ［EB/OL］. ［ 2020-10-10 ］. https://arxiv.org/abs/1907.05600.pdf https://arxiv.org/abs/1907.05600.pdf

Song Y ， Dhariwal P ， Chen M and Sutskever I . 2023 . Consistency models // Proceedings of the 40th International Conference on Machine Learning . Honolulu， USA ： JMLR.org： 32211 - 32252 ［ DOI： 10.5555/3618408.3619743 http://dx.doi.org/10.5555/3618408.3619743 ］

Song Y ， Sohl-Dickstein J ， Kingma D P ， Kumar A ， Ermon S and Poole B . 2021b . Score-based generative modeling through stochastic differential equations ［EB/OL］. ［ 2021-02-10 ］. https://arxiv.org/abs/2011.13456.pdf https://arxiv.org/abs/2011.13456.pdf

Straub J ， Whelan T ， Ma L ， Chen Y F ， Wijmans E ， Green S ， Engel J J ， Mur-Artal R ， Ren C ， Verma S ， Clarkson A ， Yan M F ， Budge B ， Yan Y J ， Pan X Q ， Yon J ， Zou Y Y ， Leon K ， Carter N ， Briales J ， Gillingham T ， Mueggler E ， Pesqueira L ， Savva M ， Batra D ， Strasdat H M ， Nardi R D ， Goesele M ， Lovegrove S and Newcombe R . 2019 . The replica dataset： a digital replica of indoor spaces ［EB/OL］. ［ 2019-06-13 ］. https://arxiv.org/abs/1906.05797.pdf https://arxiv.org/abs/1906.05797.pdf

Striuk O and Kondratenko Y . 2021 . Adaptive deep convolutional GAN for fingerprint sample synthesis // Proceedings of the 2021 IEEE 4th International Conference on Advanced Information and Communication Technologies . Lviv， Ukraine ： IEEE： 193 – 196 ［ DOI： 10.1109/AICT52120.2021.9628978 http://dx.doi.org/10.1109/AICT52120.2021.9628978 ］

Sun C Y ， Han J L ， Deng W J ， Wang X L ， Qin Z S and Gould S . 2024a . 3 Procedurald-gpt： 3 d modeling with large language models ［EB/OL］. ［ 2024-05-29 ］. https://arxiv.org/abs/2310.12945.pdf https://arxiv.org/abs/2310.12945.pdf

Sun P Z ， Jiang Y ， Chen S F ， Zhang S L ， Peng B Y ， Luo P and Yuan Z H . 2024b . Autoregressive model beats diffusion： llama for scalable image generation . ［2024-06-10］ . https：//arxiv.org/abs/2406.06525 https://arxiv.org/abs/2406.06525

Sun W Q ， Chen S ， Liu F F ， Chen Z L ， Duan Y Q ， Zhang J and Wang Y K . 2024c . DimensionX： create any 3D and 4D scenes from a single image with controllable video diffusion ［EB/OL］. ［ 2024-11-07 ］. https://arxiv.org/abs/2411.04928.pdf https://arxiv.org/abs/2411.04928.pdf

Sun X X and Zheng L . 2019 . Dissecting person re-identification from the viewpoint of viewpoint // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 608 – 617 ［ DOI： 10.1109/CVPR.2019.00070 http://dx.doi.org/10.1109/CVPR.2019.00070 ］

Swerdlow A ， Xu R S and Zhou B L . 2024 . Street-view image generation from a bird’s-eye view layout . IEEE Robotics and Automation Letters ， 9 （ 4 ）： 3578 – 3585 ［ DOI： 10.1109/LRA.2024.3368234 http://dx.doi.org/10.1109/LRA.2024.3368234 ］

Szegedy C ， Vanhoucke V ， Ioffe S ， Shlens J and Wojna Z . 2016 . Rethinking the inception architecture for computer vision // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas， NV， USA ： IEEE： 2818 – 2826 ［ DOI： 10.1109/CVPR.2016.308 http://dx.doi.org/10.1109/CVPR.2016.308 ］

Szot A ， Clegg A ， Undersander E ， Wijmans E ， Zhao Y L ， Turner J ， Maestre N ， Mukadam M ， Chaplot D S ， Maksymets O ， Gokaslan A ， Vondruš V ， Dharur S ， Meier F ， Galuba W ， Chang A ， Kira Z ， Koltun V ， Malik J ， Savva M and Batra D . 2021 . Habitat 2.0： training home assistants to rearrange their habitat . Advances in Neural Information Processing Systems ， 34 ： 251 – 266 ［ DOI： 10.5555/3540261.3540281 http://dx.doi.org/10.5555/3540261.3540281 ］

Tabak E G and Vanden-Eijnden E . 2010 . Density estimation by dual ascent of the log-likelihood . Communications in Mathematical Sciences ， 8 （ 1 ）： 217 – 233 ［ DOI： 10.4310/CMS.2010.v8.n1.a11 http://dx.doi.org/10.4310/CMS.2010.v8.n1.a11 ］

Takahashi R ， Matsubara T and Uehara K . 2020 . Data Augmentation using Random Image Cropping and Patching for Deep CNNs . IEEE Transactions on Circuits and Systems for Video Technology ， 30 （ 9 ）： 2917 – 2931 ［ DOI： 10.1109/TCSVT.2019.2935128 http://dx.doi.org/10.1109/TCSVT.2019.2935128 ］

Tang J S ， Wang T F ， Zhang B ， Zhang T ， Yi R ， Ma L and Chen D . 2023 . Make-it-3d： high-fidelity 3d creation from a single image with diffusion prior // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 22762 – 23772 ［ DOI： 10.1109/ICCV51070.2023.02086 http://dx.doi.org/10.1109/ICCV51070.2023.02086 ］

Tang J X ， Ren J W ， Zhou H ， Liu Z W and Zeng G . 2024 . DreamGaussian： generative gaussian splatting for efficient 3D content creation ［EB/OL］. ［ 2024-03-29 ］. https://arxiv.org/abs/2309.16653.pdf https://arxiv.org/abs/2309.16653.pdf

Team G . 2024a . Mochi 1

Team V . 2024b . Vchitect-2 . 0

Tevet G ， Gordon B ， Hertz A ， Bermano A H and Cohen-Or D . 2022 . MotionClip： Exposing human motion generation to CLIP space // In Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer： 358 – 374 ［ DOI： 10.1007/978-3-031-20047-2_21 http://dx.doi.org/10.1007/978-3-031-20047-2_21 ］

Tevet G ， Raab S ， Gordon B ， Shafir Y ， Cohen-Or D and Bermano A H . 2023 . Human motion diffusion model // Proceedings of the The Eleventh International Conference on Learning Representations . Kigali， Rwanda ：［ DOI： 10.48550/arXiv.2209.14916 http://dx.doi.org/10.48550/arXiv.2209.14916 ］

Tobin J ， Fong R ， Ray A ， Schneider J ， Zaremba W and Abbeel P . 2017 . Domain randomization for transferring deep neural networks from simulation to the real world // Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems . Vancouver， BC， Canada ： IEEE： 23 – 30 ［ DOI： 10.1109/IROS.2017.8202133 http://dx.doi.org/10.1109/IROS.2017.8202133 ］

Tomczak J M and Welling M . 2016 . Improving variational auto-encoders using Householder Flow ［EB/OL］. ［ 2016-11-29 ］. https://arxiv.org/abs/1611.09630.pdf https://arxiv.org/abs/1611.09630.pdf

Tong T ， Li G ， Liu X and Gao Q . 2017 . Image super-resolution using dense skip connections // Proceedings of the IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 4809 – 4817 ［ DOI： 10.1109/ICCV.2017.514 http://dx.doi.org/10.1109/ICCV.2017.514 ］

Tongyi A . 2024 . Wanxiang video

Torne M ， Simeonov A ， Li Z ， Chan A ， Chen T ， Gupta A and Agrawal P . 2024 . Reconciling reality through simulation： a real-to-sim-to-real approach for robust manipulation ［EB/OL］. ［ 2024-11-24 ］. https://arxiv.org/abs/2403.03949.pdf https://arxiv.org/abs/2403.03949.pdf

Trabucco B ， Doherty K ， Gurinas M A and Salakhutdinov R . 2023 . Effective data augmentation with diffusion models . ［2023-2-7］ . https：//arxiv.org/pdf/2302.07944.pdf https://arxiv.org/pdf/2302.07944.pdf

Tran L ， Yin X and Liu X . 2017 . Disentangled representation learning GAN for pose-invariant face recognition // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： 1415 – 1424 ［ DOI： 10.1109/CVPR.2017.141 http://dx.doi.org/10.1109/CVPR.2017.141 ］

Tsao L Y ， Lo Y C ， Chang C C ， Chen -W ， Tseng R ， Feng C and Lee C Y . 2024 . Boosting Flow-based Generative Super-Resolution Models via Learned Prior // Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 26005 – 26015 ［ DOI： 10.1109/CVPR52733.2024.02457 http://dx.doi.org/10.1109/CVPR52733.2024.02457 ］

Tulyakov S ， Liu M Y ， Yang X and Kautz J . 2018 . MoCoGAN： Decomposing motion and content for video generation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 1526 – 1535 ［ DOI： 10.1109/CVPR.2018.00165 http://dx.doi.org/10.1109/CVPR.2018.00165 ］

Turing A M . 1990 . The chemical basis of morphogenesis . Bulletin of Mathematical Biology ， 52 （ 1 ）： 153 – 197 ［ DOI： 10.1007/BF02459572 http://dx.doi.org/10.1007/BF02459572 ］

Vahdat A ， Kreis K and Kautz J . 2021 . Score-based Generative Modeling in Latent Space // Proceedings of the Advances in Neural Information Processing Systems . Virtual ： Curran Associates， Inc： 11287 – 11302 ［ DOI： 10.5555/3540261.3541124 http://dx.doi.org/10.5555/3540261.3541124 ］

Van Den Berg R ， Hasenclever L ， Tomczak J M ， and Welling M . 2018 . Sylvester normalizing flows for variational inference // Proceedings of the Conference on Uncertainty in Artificial Intelligence . Monterey， California， USA ： AUAI Press： 393 - 402 ［ DOI： http：//auai.org/uai2018/proceedings/papers/156.pdf http://dx.doi.org/http：//auai.org/uai2018/proceedings/papers/156.pdf ］

Varol G ， Laptev I and Schmid C . 2021 . Synthetic humans for action recognition from unseen viewpoints . International Journal of Computer Vision ， 129 （ 8 ）： 2264 – 2287 ［ DOI： 10.1007/s11263-021-01467-7 http://dx.doi.org/10.1007/s11263-021-01467-7 ］

Varol G ， Romero J ， Martin X ， Mahmood N ， Black M J ， Laptev I and Schmid C . 2017 . Learning from synthetic humans // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 109 – 117 ［ DOI： 10.1109/CVPR.2017.492 http://dx.doi.org/10.1109/CVPR.2017.492 ］

Vaswani A ， Shazeer N M ， Parmar N ， Uszkoreit J ， Jones L ， Gomez A N ， Kaiser L and Polosukhin I . 2017 . Attention is all you need // Proceedings of the 2017 Neural Information Processing Systems . Long Beach， CA， USA ： Curran Associates， Inc： 5998 – 6008 ［ DOI： 10.48550/arXiv.1706.03762 http://dx.doi.org/10.48550/arXiv.1706.03762 ］

Vincent P ， Larochelle H ， Lajoie I ， et al . 2010 . Stacked denoising autoencoders： Learning useful representations in a deep network with a local denoising criterion . Journal of Machine Learning Research ， 11 ： 3371 - 3408 ［ DOI： 10.1162/jmlr.2010.11.1.3371 http://dx.doi.org/10.1162/jmlr.2010.11.1.3371 ］

Vizcaya P R and Gerhardt L A . 1996 . A nonlinear orientation model for global description of fingerprints . Pattern Recognition ， 29 （ 7 ）： 1221 – 1231 ［ DOI： 10.1016/0031-3203（95）00154-9 http://dx.doi.org/10.1016/0031-3203（95）00154-9 ］

Voleti V ， Yao C H ， Boss M ， Letts A ， Pankratz D ， Tochilkin D ， Laforte C ， Rombach R and Jampani V . 2024 . SV 3 D ： novel multi-view synthesis and 3D generation from a single image using latent video diffusion // Proceedings of the 2024 European Conference on Computer Vision. Milan， Italy ： Springer： 439 – 457 ［ DOI： 10.1007/978-3-031-73232-4_25 http://dx.doi.org/10.1007/978-3-031-73232-4_25 ］

Von Marcard T ， Henschel R ， Black M J ， Rosenhahn B and Pons-Moll G . 2018 . Recovering accurate 3D human pose in the wild using IMUs and a moving camera // Proceedings of the European Conference on Computer Vision . Munich， Germany ： Springer Nature Switzerland： 614 - 631 ［ DOI： 10.1007/978-3-030-01249-6_37 http://dx.doi.org/10.1007/978-3-030-01249-6_37 ］

Wang C ， Chai M L ， He M M ， Chen D D and Liao J . 2022a . Clip-nerf： text-and-image driven manipulation of neural radiance fields // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 3835 – 3844 ［ DOI： 10.1109/CVPR52688.2022.00381 http://dx.doi.org/10.1109/CVPR52688.2022.00381 ］

Wang C ， He Z F ， Wang C Y and Tian Q . 2022b . Generating intra- and inter-class iris images by identity contrast // Proceedings of the 2022 IEEE International Joint Conference on Biometrics . Abu Dhabi， United Arab Emirates ： IEEE： 1 – 7 ［ DOI： 10.1109/IJCB54206.2022.10007974 http://dx.doi.org/10.1109/IJCB54206.2022.10007974 ］

Wang H C ， Du X D ， Li J H ， Yeh R A and Shakhnarovich G . 2023a . Score jacobian chaining： lifting pretrained 2d diffusion models for 3d generation // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 12619 – 12629 ［ DOI： 10.1109/CVPR52729.2023.01214 http://dx.doi.org/10.1109/CVPR52729.2023.01214 ］

Wang H Q ， Chen J H ， Huang W S ， Ben Q W ， Wang T ， Mi B Y ， Huang T ， Zhao S H ， Chen Y L ， Yang S Z ， Cao P Z ， Yu W Y ， Ye Z C ， Li J L ， Long J F ， Wang Z R ， Wang H L ， Zhao Y ， Tu Z Y ， Qiao Y ， Lin D H and Pang J M . 2024a . Grutopia： dream general robots in a city at scale ［EB/OL］. ［ 2024-07-15 ］. https://arxiv.org/abs/2407.10943.pdf https://arxiv.org/abs/2407.10943.pdf

Wang J N ， Yuan H J ， Chen D Y ， Zhang Y Y ， Wang X and Zhang S W . 2023b . Modelscope text-to-video technical report ［EB/OL］. ［ 2023-08-12 ］. https://arxiv.org/abs/2308.06571.pdf https://arxiv.org/abs/2308.06571.pdf

Wang J ， H Jin S ， Liu W T ， Liu W Z ， Qian C and Luo P . 2021a . When Human Pose Estimation Meets Robustness： Adversarial Algorithms and Benchmarks // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 11850 – 11859 ［ DOI： 10.1109/CVPR46437.2021.01168 http://dx.doi.org/10.1109/CVPR46437.2021.01168 ］

Wang J ， Yue Z ， Zhou S ， Chan KCK and Loy CC . 2024b . Exploiting diffusion prior for real-world image super-resolution . International Journal of Computer Vision ， 132 ： 5929 – 5949 ［ DOI： 10.1007/s11263-024-02168-7 http://dx.doi.org/10.1007/s11263-024-02168-7 ］

Wang L ， Sindagi V and Patel V . 2018a . High-quality facial photo-sketch synthesis using multi-adversarial networks // Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition . Xi’an， China ： IEEE： 83 – 90 ［ DOI： 10.1109/FG.2018.00022 http://dx.doi.org/10.1109/FG.2018.00022 ］

Wang Q ， Gao J Y ， Lin W and Yuan Y . 2019 . Learning from synthetic data for crowd counting in the wild // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 8198 – 8207 ［ DOI： 10.1109/CVPR.2019.00839 http://dx.doi.org/10.1109/CVPR.2019.00839 ］

Wang S Y ， Du Y Q ， Guo X J ， Pan B ， Qin Z H ， Zhao L . 2024c . Controllable Data Generation by Deep Learning： A Review. ACM Comput . Surv. ， 56 （ 9 ）： 1 – 38 ［ DOI： 10.1145/3648609 http://dx.doi.org/10.1145/3648609 ］

Wang T F ， Zhang B ， Zhang T ， Gu S Y ， Bao J M ， Baltrusaitis T ， Shen J J ， Chen D ， Wen F and Chen Q F . 2023c . Rodin： a generative model for sculpting 3d digital avatars using diffusion // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 4563 – 4573 ［ DOI： 10.1109/CVPR52729.2023.00443 http://dx.doi.org/10.1109/CVPR52729.2023.00443 ］

Wang W J ， Ge Y T ， Mei H Y ， Cai Z A ， Sun Q P ， Wang Y J ， Shen C H ， Yang L and Komura T . 2023d . Zolly： zoom focal length correctly for perspective-distorted human mesh reconstruction // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 3925 – 3935 ［ DOI： 10.1109/ICCV51070.2023.00363 http://dx.doi.org/10.1109/ICCV51070.2023.00363 ］

Wang W J ， Yang H ， Fu J L and Liu J Y . 2024c . Zero-reference low-light enhancement via physical quadruple priors // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 26057 – 26066 ［ DOI： 10.1109/CVPR52733.2024.02462 http://dx.doi.org/10.1109/CVPR52733.2024.02462 ］

Wang W J ， Yang H ， Tuo Z X ， He H G ， Zhu J C ， Fu J L and Liu J Y . 2024d . Videofactory： swap attention in spatiotemporal diffusions for text-to-video generation ［EB/OL］. ［ 2024-04-24 ］. https://arxiv.org/abs/2305.10874.pdf https://arxiv.org/abs/2305.10874.pdf

Wang X T ， Xie L B ， Dong C and Shan Y . 2021b . Real-ESRGAN： Training real-world blind super-resolution with pure synthetic data // Proceedings of the IEEE International Conference on Computer Vision . Montreal， BC， Canada ： IEEE： 1905 – 1914 ［ DOI： 10.1109/ICCVW54120.2021.00217 http://dx.doi.org/10.1109/ICCVW54120.2021.00217 ］

Wang X T ， Yu K ， Wu S X ， Gu J J ， Liu Y H ， Dong C ， Qiao Y and Loy C C . 2018b . ESRGAN： enhanced super-resolution generative adversarial networks // Proceedings of the IEEE European Conference on Computer Vision . Amsterdam， Netherlands ： Springer： 63 – 79 ［ DOI： 10.1007/978-3-030-11021-5_5 http://dx.doi.org/10.1007/978-3-030-11021-5_5 ］

Wang X Y ， Darrell T ， Rambhatla S S ， Girdhar R and Misra I . 2024e . InstanceDiffusion： Instance-level control for image generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 6232 – 6242 ［ DOI： 10.1109/CVPR52733.2024.00596 http://dx.doi.org/10.1109/CVPR52733.2024.00596 ］

Wang X ， Zhu Z Y ， Huang G ， Chen X W and Lu J W . 2023e . DriveDreamer： Towards real-world-driven world models for autonomous driving . ［2023-9-18］ . https：//arxiv.org/pdf/2309.09777.pdf https://arxiv.org/pdf/2309.09777.pdf

Wang Y and Hu J . 2011 . Global ridge orientation modeling for partial fingerprint identification . IEEE Transactions on Pattern Analysis and Machine Intelligence ， 33 （ 1 ）： 72 – 87 ［ DOI： 10.1109/TPAMI.2010.73 http://dx.doi.org/10.1109/TPAMI.2010.73 ］

Wang Y F ， Wan R J ， Yang W H ， Li H L ， Chau L P and Kot A C . 2022c . Low-light image enhancement with normalizing flow // Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver， Canada ： AAAI： 2604 – 2612 ［ DOI： 10.1609/aaai.v36i3.20162 http://dx.doi.org/10.1609/aaai.v36i3.20162 ］

Wang Y F ， Yu Y ， Yang W H ， Guo L Q ， Chau L P ， Kot A C and Wen B . 2023f . ExposureDiffusion： Learning to expose for low-light image enhancement // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 12404 – 12414 ［ DOI： 10.1109/ICCV51070.2023.01143 http://dx.doi.org/10.1109/ICCV51070.2023.01143 ］

Wang Y H ， Chen X Y ， Ma X ， Zhou S C ， Huang Z Q ， Wang Y ， Yang C Y ， He Y N ， Yu J S ， Yang P Q ， Guo Y W ， Wu T X ， Si C Y ， Jiang Y M ， Chen C J ， Loy C C ， Dai B ， Lin D H ， Qiao Y and Liu Z W . 2023g . LAVIE： high-quality video generation with cascaded latent diffusion models ［EB/OL］. ［ 2023-09-27 ］. https://arxiv.org/abs/2309.15103.pdf https://arxiv.org/abs/2309.15103.pdf

Wang Y N ， Liang X Z and Liao S C . 2022d . Cloning outfits from real-world images to 3d characters for generalizable person re-identification // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 4890 – 4899 ［ DOI： 10.1109/CVPR52688.2022.00485 http://dx.doi.org/10.1109/CVPR52688.2022.00485 ］

Wang Y N ， Liao S C and Shao L . 2020 . Surpassing real-world source training data： random 3d characters for generalizable person re-identification // Proceedings of the 28th ACM International Conference on Multimedia . Seattle， WA， USA ： ACM： 3422 – 3430 ［ DOI： 10.1145/3394171.3413815 http://dx.doi.org/10.1145/3394171.3413815 ］

Wang Y Q ， He J W ， Fan L ， Li H X ， Chen Y T and Zhang Z X . 2024f . Driving into the future： multiview visual forecasting and planning with world model for autonomous driving // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 14749 – 14759 ［ DOI： 10.1109/CVPR52733.2024.01397 http://dx.doi.org/10.1109/CVPR52733.2024.01397 ］

Wang Y ， Xian Z ， Chen F ， Wang T H ， Wang Y ， Fragkiadaki K ， Erickson Z ， Held D and Gan C . 2024g . RoboGen： towards unleashing infinite data for automated robot learning via generative simulation ［EB/OL］. ［ 2024-06-14 ］. https://arxiv.org/abs/2311.01455.pdf https://arxiv.org/abs/2311.01455.pdf

Wang Z Y ， Lu C ， Wang Y K ， Bao F ， Li C X ， Su H and Zhu J . 2024h . Prolificdreamer： high-fidelity and diverse text-to-3d generation with variational score distillation // Proceedings of the 2024 Advances in Neural Information Processing Systems . New Orleans LA USA ： Curran Associates Inc： 8406 – 8441 ［ DOI： 10.5555/3666122.3666490 http://dx.doi.org/10.5555/3666122.3666490 ］

Wecker L ， Samavati F and Gavrilova M . 2010 . A multiresolution approach to iris synthesis . Computers & Graphics ， 34 （ 4 ）： 468 – 478 ［ DOI： 10.1016/j.cag.2010.05.012 http://dx.doi.org/10.1016/j.cag.2010.05.012 ］

Wei L H ， Zhang S L ， Gao W and Tian Q . 2018 . Person transfer GAN to bridge domain gap for person re-identification // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 79 – 88 ［ DOI： 10.1109/CVPR.2018.00016 http://dx.doi.org/10.1109/CVPR.2018.00016 ］

Wei Z S ， Han Y F ， Sun Z N and Tan T N . 2008a . Palmprint image synthesis： a preliminary study // 2008 15th IEEE International Conference on Image Processing . San Diego， CA， USA ： IEEE： 285 – 288 ［ DOI： 10.1109/ICIP.2008.4711747 http://dx.doi.org/10.1109/ICIP.2008.4711747 ］

Wei Z S ， Tan T N and Sun Z N . 2007 . Nonlinear Iris Deformation Correction Based on Gaussian Model // Proceedings of the Advances in Biometrics ， International Conference. Seoul， Korea ： Springer： 780 – 789 ［ DOI： 10.1007/978-3-540-74549-5_82 http://dx.doi.org/10.1007/978-3-540-74549-5_82 ］

Wei Z S ， Tan T N and Sun Z N . 2008b . Synthesis of large realistic iris databases using patch-based sampling // 2008 19th International Conference on Pattern Recognition . Tampa， FL， USA ： IEEE： 1 – 4 ［ DOI： 10.1109/ICPR.2008.4761674 http://dx.doi.org/10.1109/ICPR.2008.4761674 ］

Wu C ， Huang L ， Zhang et al . Qi . 2021 . GODIVA： Generating open-domain videos from natural descriptions Printing： https：//arxiv.org/abs/2104.14806.pdf https://arxiv.org/abs/2104.14806.pdf

Wu C ， Liang J and Ji L . 2022 . Nüwa： Visual synthesis pre-training for neural visual world creation // Proceedings of the European Conference on Computer Vision . Tel Aviv， Israel ： Springer Nature Switzerland： 720 – 736 ［ DOI： 10.1007/978-3-031-19787-1_41 http://dx.doi.org/10.1007/978-3-031-19787-1_41 ］

Wu G J ， Yi T R ， Fang J M ， Xie L X ， Zhang X P ， Wei W ， Liu W Y ， Tian Q and Wang X G . 2024a . 4D gaussian splatting for real-time dynamic scene rendering // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 20310 – 20320 ［ DOI： 10.1109/CVPR52733.2024.01920 http://dx.doi.org/10.1109/CVPR52733.2024.01920 ］

Wu J Z ， Ge Y X ， Wang X T ， Lei S W ， Gu Y C ， Shi Y F ， Hsu W ， Shan Y ， Qie X H and Shou M Z . 2023a . Tune-a-video： one-shot tuning of image diffusion models for text-to-video generation // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 7623 – 7633 ［ DOI： 10.1109/ICCV51070.2023.00701 http://dx.doi.org/10.1109/ICCV51070.2023.00701 ］

Wu W J ， Zhao Y Z ， Chen H ， Gu Y C ， Zhao R ， He Y F ， Zhou H ， Shou M Z and Shen C H . 2023b . DatasetDM： Synthesizing Data with Perception Annotations Using Diffusion Models // Proceedings of the Advances in Neural Information Processing Systems . San Francisco， California， USA ： Curran Associates， Inc： 54683 – 54695 ［ DOI： 10.48550/arXiv.2308.06160 http://dx.doi.org/10.48550/arXiv.2308.06160 ］

Wu W ， Zhao Y ， Shou M Z ， Zhou H and Shen C H . 2023c . Diffumask： Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 1206 – 1217 ［ DOI： 10.1109/ICCV51070.2023.00117 http://dx.doi.org/10.1109/ICCV51070.2023.00117 ］

Wu Y S ， Shi L Y ， Liu H L ， Liao H J ， Qiu L T ， Yuan W H ， Gu X D ， Dong Z L ， Cui S G and Han X G . 2024b . Mvimgnet2.0： a larger-scale dataset of multi-view images . ACM Transactions on Graphics ， 43 （ 6 ）： 173 ：1-- 173 ： 16 ［ DOI： 10.1145/3687973 http://dx.doi.org/10.1145/3687973 ］

Wyzykowski A B V ， Segundo M P and de Paula Lemes R . 2021 . Level three synthetic fingerprint generation // Proceedings of the 2020 25th International Conference on Pattern Recognition . Milan， Italy ： IEEE： 9250 – 9257 ［ DOI： 10.1109/ICPR48806.2021.9412304 http://dx.doi.org/10.1109/ICPR48806.2021.9412304 ］

Xia B ， Zhang Y ， Wang S ， Wang Y ， Wu X ， Tian Y ， Yang W and Gool L V . 2023 . DiffIR： Efficient diffusion model for image restoration // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 13049 – 13059 ［ DOI： 10.1109/ICCV51070.2023.01204 http://dx.doi.org/10.1109/ICCV51070.2023.01204 ］

Xia F ， Shen W B ， Li C ， Kasimbeg P ， Tchapmi M ， Toshev A ， Martín-Martín R and Savarese S . 2020 . Interactive gibson benchmark： a benchmark for interactive navigation in cluttered environments . IEEE Robotics and Automation Letters ， 5 （ 2 ）： 713 – 720 ［ DOI： 10.1109/LRA.2020.2965104 http://dx.doi.org/10.1109/LRA.2020.2965104 ］

Xia F ， Zamir A R ， He Z Y ， Sax A ， Malik J and Savarese S . 2018 . Gibson env： real-world perception for embodied agents // Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City， UT， USA ： IEEE： 9068 – 9079 ［ DOI： 10.1109/CVPR.2018.00945 http://dx.doi.org/10.1109/CVPR.2018.00945 ］

Xiang F ， Qin Y ， Mo K ， Xia Y ， Zhu H ， Liu F ， Liu M ， Jiang H ， Yuan Y ， Wang H ， Yi L ， Chang A X ， Guibas L J and Su H . 2020 . SAPIEN： a simulated part-based interactive environment // Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 11097 – 11107 ［ DOI： 10.1109/CVPR42600.2020.01111 http://dx.doi.org/10.1109/CVPR42600.2020.01111 ］

Xiang J F ， Yang J L ， Deng Y and Tong X . 2023 . Gram-hd： 3d-consistent image generation at high resolution with generative radiance manifolds // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 2195 – 2205 ［ DOI： 10.1109/ICCV51070.2023.00209 http://dx.doi.org/10.1109/ICCV51070.2023.00209 ］

Xiao C W ， Li B ， Zhu J Y ， He W ， Liu M Y and Song D . 2018 . Generating adversarial examples with adversarial networks // Proceedings of the 27th International Joint Conference on Artificial Intelligence . Stockholm， Sweden ： AAAI Press： 3905 – 3911 ［ DOI： 10.1145/3394171.3413815 http://dx.doi.org/10.1145/3394171.3413815 ］

Xie T Y ， Zong Z S ， Qiu Y X ， Li X ， Feng Y T ， Yang Y and Jiang C F F . 2024 . PhysGaussian： physics-integrated 3D gaussians for generative dynamics // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 4389 – 4398 ［ DOI： 10.1109/CVPR52733.2024.00420 http://dx.doi.org/10.1109/CVPR52733.2024.00420 ］

Xing J B ， Xia M H ， Zhang Y ， Chen H X ， Yu W B ， Liu H Y ， Liu G Y ， Wang X T ， Shan Y and Wong T T . 2024 . Dynamicrafter： animating open-domain images with video diffusion priors // Proceedings of the European Conference on Computer Vision . Milano， Italy ： Springer： 399 – 417 ［ DOI： 10.1007/978-3-031-72952-2_23 http://dx.doi.org/10.1007/978-3-031-72952-2_23 ］

Xu D ， Ouyang W L and Ricci E . 2017a . Learning cross-modal deep representations for robust pedestrian detection // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 4236 – 4244 ［ DOI： 10.1109/CVPR.2017.451 http://dx.doi.org/10.1109/CVPR.2017.451 ］

Xu H F ， Peng S Y ， Wang F J H ， Blum H ， Barath D ， Geiger A and Pollefeys M . 2024a . DepthSplat： connecting gaussian splatting and depth ［EB/OL］. ［ 2024-11-22 ］. https://arxiv.org/abs/2410.13862.pdf https://arxiv.org/abs/2410.13862.pdf

Xu J M ， Liu S T ， Vahdat A ， Byeon W ， Wang X L and De Mello S . 2023 . Open-vocabulary panoptic segmentation with text-to-image diffusion models // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 2955 – 2966 ［ DOI： 10.1109/CVPR52729.2023.00289 http://dx.doi.org/10.1109/CVPR52729.2023.00289 ］

Xu J Q ， Zou X Y ， Huang K Z ， Chen Y K ， Liu B ， Cheng M L ， Shi X and Huang J . 2024b . EasyAnimate： a high-performance long video generation method based on transformer architecture ［EB/OL］. ［ 2024-07-05 ］. https://arxiv.org/abs/2405.18991.pdf https://arxiv.org/abs/2405.18991.pdf

Xu W ， Ba L and Kingma D P . 2017 . Variational autoencoder for semi-supervised text classification // Proceedings of the AAAI Conference on Artificial Intelligence . San Francisco， California USA ： AAAI： 3358 - 3364 ［ DOI： 10.1609/aaai.v31i1.10966 http://dx.doi.org/10.1609/aaai.v31i1.10966 ］

Xu Y ， Zhao Y ， Xiao Z ， and Hou T . 2024d . Ufogen： You forward once large scale text-to-image generation via diffusion gans // Proceedings of Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 8196 - 8206 ［ DOI： 10.1109/CVPR52733.2024.00783 http://dx.doi.org/10.1109/CVPR52733.2024.00783 ］

Yadav S and Ross A . 2021 . CIT-GAN： cyclic image translation generative adversarial network with application in iris presentation attack detection // Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa， HI， USA ： IEEE： 2411 – 2420 ［ DOI： 10.1109/WACV48630.2021.00246 http://dx.doi.org/10.1109/WACV48630.2021.00246 ］

Yadav S ， Chen C and Ross A . 2019 . Synthesizing iris images using RaSGAN with application in presentation attack detection // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops . Long Beach， CA， USA ： IEEE： 2422 – 2430 ［ DOI： 10.1109/CVPRW.2019.00297 http://dx.doi.org/10.1109/CVPRW.2019.00297 ］

Yan C ， Misra D K ， Bennett A ， Walsman A ， Bisk Y and Artzi Y . 2019 . CHALET： cornell house agent learning environment ［EB/OL］. ［ 2019-07-16 ］. https://arxiv.org/abs/1801.07357.pdf https://arxiv.org/abs/1801.07357.pdf

Yan H ， Liu Y L ， Jin L W ， B X . 2023 . The development， application， and future of LLM similar to ChatGPT . Journal of Image and Graphics ， 28 （ 09 ）： 2749 - 2762

严昊，刘禹良，金连文，白翔 . 2023 . 类ChatGPT大模型发展、应用和前景 . 中国图象图形学报， 28 （ 09 ）： 2749 - 2762 ［ DOI： 10.11834/jig.230536 http://dx.doi.org/10.11834/jig.230536 ］

Yan Y Z ， Lin H T ， Zhou C X ， Wang W J ， Sun H Y ， Zhan K ， Lang X P ， Zhou X W and Peng S D . 2024a . Street gaussians for modeling dynamic urban scenes ［EB/OL］. ［ 2024-08-18 ］. https://arxiv.org/abs/2401.01339.pdf https://arxiv.org/abs/2401.01339.pdf

Yan Z W ， Low W F ， Chen Y and Lee G H . 2024b . Multi-scale 3D gaussian splatting for anti-aliased rendering // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 20923 – 20931 ［ DOI： 10.1109/CVPR52733.2024.01977 http://dx.doi.org/10.1109/CVPR52733.2024.01977 ］

Yang G K . 2023a . A Method for Generating Fingerprint Images Based on Diffusion Models . ournal of Hebei Academy of Sciences ，， 40 （ 01 ）： 13 - 18+66

杨光锴 . 2023a . 基于扩散模型的指纹图像生成方法 . 河北省科学院学报， 40 （ 01 ）： 13 - 18+66 ［ DOI： 10.16191/j.cnki.hbkx.2023.01.009 http://dx.doi.org/10.16191/j.cnki.hbkx.2023.01.009 ］

Yang H W ， Fang P and Hao Z A . 2021 . A GAN-based method for generating finger vein dataset // Proceedings of the 2020 3rd International Conference on Algorithms， Computing and Artificial Intelligence . New York， NY， USA ： Association for Computing Machinery：［ DOI： 10.1145/3446132.3446150 http://dx.doi.org/10.1145/3446132.3446150 ］

Yang L H ， Xu X G ， Kang B Y ， Shi Y H and Zhao H S . 2023b . FreeMask： Synthetic Images with Dense Annotations Make Stronger Segmentation Models // Proceedings of the Advances in Neural Information Processing Systems . Vancouver， Canada ： Curran Associates， Inc： 18659 – 18675 ［ DOI： 10.48550/arXiv.2310.15160 http://dx.doi.org/10.48550/arXiv.2310.15160 ］

Yang W ， Tan R T ， Feng J ， Liu J ， Guo Z and Yan S . 2017 . Deep Joint Rain Detection and Removal from a Single Image // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 1685 – 1694 ［ DOI： doi： 10.1109/CVPR.2017.183 http://dx.doi.org/doi：10.1109/CVPR.2017.183 ］

Yang W H ， Wang S Q ， Fang Y M ， Wang Y and Liu J Y . 2020 . From fidelity to perceptual quality： A semi-supervised approach for low-light image enhancement // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Honolulu， HI， USA ： IEEE： 1685 – 1694 ［ DOI： 10.1109/CVPR.2017.183 http://dx.doi.org/10.1109/CVPR.2017.183 ］

Yang Z T ， Cai Z A ， Mei H Y ， Liu S ， Chen Z X ， Xiao W Y ， Wei Y K ， Qing Z F ， Wei C ， Dai B ， Wu W ， Qian C ， Lin D H ， Liu Z W and Yang L . 2023c . Synbody： synthetic dataset with layered human models for 3d human perception and modeling // Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 20282 – 20292 ［ DOI： 10.1109/ICCV51070.2023.01855 http://dx.doi.org/10.1109/ICCV51070.2023.01855 ］

Yang Z X ， Wang J C ， Gan Z ， Li L J ， Lin K C ， Wu C J ， Duan N ， Liu Z ， Liu C G and Zeng M . 2023d . RECO： Region-controlled text-to-image generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 14246 – 14255 ［ DOI： 10.1109/CVPR52598.2023.014246 http://dx.doi.org/10.1109/CVPR52598.2023.014246 ］

Yang Z Y ， Gao X Y ， Zhou W ， Jiao S H ， Zhang Y Q and Jin X G . 2024a . Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 20331 – 20341 ［ DOI： 10.1109/CVPR52733.2024.01922 http://dx.doi.org/10.1109/CVPR52733.2024.01922 ］

Yang Z Y ， Teng J Y ， Zheng W ， Ding M ， Huang S Y ， Xu J Z ， Yang Y M ， Hong W Y ， Zhang X H ， Feng G Y ， Yin D ， Gu X T ， Zhang Y X ， Wang W H ， Cheng Y ， Liu T ， Xu B ， Dong Y X and Tang J . 2024b . CogVideoX： text-to-video diffusion models with an expert transformer ［EB/OL］. ［ 2024-10-08 ］. https://arxiv.org/abs/2408.06072.pdf https://arxiv.org/abs/2408.06072.pdf

Yang Z Y ， Yang H Y ， Pan Z J and Zhang L . 2024c . Real-time photorealistic dynamic scene representation and rendering with 4D gaussian splatting ［EB/OL］. ［ 2024-02-22 ］. https://arxiv.org/abs/2310.10642.pdf https://arxiv.org/abs/2310.10642.pdf

Yang Z ， Li S J ， Wu W and Dai B . 2023d . 3DHumanGAN： 3D-aware human image generation with 3D pose mapping // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 22951 – 22962 ［ DOI： 10.1109/ICCV51070.2023.02103 http://dx.doi.org/10.1109/ICCV51070.2023.02103 ］

Yao K ， Gao P L ， Yang X ， Huang K Z ， Sun J and Zhang R . 2022 . Outpainting by Queries ［EB/OL］. ［ 2022-07-12 ］. https://arxiv.org/abs/2207.05312.pdf https://arxiv.org/abs/2207.05312.pdf

Ye K Y ， Hou Q M and Zhou K . 2024 . 3D gaussian splatting with deferred reflection // Proceedings of the ACM SIGGRAPH 2024 Conference Papers . New York， NY， USA ： Association for Computing Machinery： 1 – 10 ［ DOI： 10.1145/3641519.3657456 http://dx.doi.org/10.1145/3641519.3657456 ］

Ye Y ， Chang Y ， Zhou H and Yan L . 2021 . Closing the loop： Joint rain generation and removal via disentangled image translation // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 2053 - 2062 ［ DOI： 10.1109/CVPR46437.2021.00209 http://dx.doi.org/10.1109/CVPR46437.2021.00209 ］

Yi T R ， Fang J M ， Wang J J ， Wu G J ， Xie L X ， Zhang X P ， Liu W Y ， Tian Q and Wang X G . 2024 . GaussianDreamer： fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models // Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 6796 – 6807 ［ DOI： 10.1109/CVPR52733.2024.00649 http://dx.doi.org/10.1109/CVPR52733.2024.00649 ］

Yim J ， Lee H ， Eum S ， Shen Y T ， Zhang Y ， Kwon H and Bhattacharyya S S . 2024 . SynPlay： importing real-world diversity for a synthetic human dataset ［EB/OL］. ［ 2024-08-21 ］. http://arxiv.org/abs/2408.11814.pdf http://arxiv.org/abs/2408.11814.pdf

Yin W Q ， Cai Z A ， Wang R S ， Wang F Z ， Wei C ， Mei H Y ， Xiao W Y ， Yang Z T ， Sun Q P ， Yamashita A ， Liu Z W and Yang L . 2024 . WHAC： world-grounded humans and cameras // Proceedings of the European Conference on Computer Vision . Milan， Italy ： Springer： 20 – 37 ［ DOI： 10.48550/arXiv.2403.12959 http://dx.doi.org/10.48550/arXiv.2403.12959 ］

Yin X ， Yu X ， Sohn K ， Liu X and Chandraker M . 2017 . Towards large-pose face frontalization in the wild // Proceedings of the 2017 IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 4010 – 4019 ［ DOI： 10.1109/ICCV.2017.430 http://dx.doi.org/10.1109/ICCV.2017.430 ］

Yu A ， Foote A ， Mooney R and Martín-Martín R . 2024a . Natural language can help bridge the Sim2Real gap ［EB/OL］. ［ 2024-07-02 ］. https://arxiv.org/abs/2405.10020.pdf https://arxiv.org/abs/2405.10020.pdf

Yu A ， Ye V ， Tancik M and Kanazawa A . 2021 . PixelNeRF： Neural radiance fields from one or few images // Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 4578 - 4587 ［ DOI： 10.1109/CVPR46437.2021.00455 http://dx.doi.org/10.1109/CVPR46437.2021.00455 ］

Yu S ， Nie W L ， Huang D A ， Li B ， Shin J and Anandkumar A . 2024b . Efficient video diffusion models via content-frame motion-latent decomposition ［EB/OL］. ［ 2024-03-21 ］. https://arxiv.org/abs/2403.14148.pdf https://arxiv.org/abs/2403.14148.pdf

Yu W B ， Xing J B ， Yuan L ， Hu W B ， Li X Y ， Huang Z P ， Gao X J ， Wong T T ， Shan Y and Tian Y H . 2024c . Viewcrafter： taming video diffusion models for high-fidelity novel view synthesis ［EB/OL］. ［ 2024-09-03 ］. https://arxiv.org/abs/2409.02048.pdf https://arxiv.org/abs/2409.02048.pdf

Yu W H ， Tan J ， Liu C K and Turk G . 2017 . Preparing for the unknown： learning a universal policy with online system identification // Proceedings of Robotics： Science and Systems . Cambridge， Massachusetts， USA ： DBLP：［ DOI： 10.15607/RSS.2017.XIII.048 http://dx.doi.org/10.15607/RSS.2017.XIII.048 ］

Yu X G ， Xu M T ， Zhang Y D ， Liu H L ， Ye C J ， Wu Y S ， Yan Z Z ， Zhu C M ， Xiong Z Y and Liang T Y . 2023 . Mvimgnet： a large-scale dataset of multi-view images // Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada ： IEEE： 9150 – 9161 ［ DOI： 10.1109/CVPR52729.2023.00883 http://dx.doi.org/10.1109/CVPR52729.2023.00883 ］

Yu X ， Guo Y C ， Li Y G ， Liang D ， Zhang S H and Qi X J . 2024d . Text-to-3d with classifier score distillation // Proceedings of the 2024 International Conference on Learning Representations . Vienna， Austria . ［ DOI： 10.48550/arXiv.2310.19415 http://dx.doi.org/10.48550/arXiv.2310.19415 ］

Yu Z ， Chen A ， Huang B ， Sattler T and Geiger A . 2024e . Mip-splatting： alias-free 3D gaussian splatting // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 19447 – 19456 ［ DOI： 10.1109/CVPR52733.2024.01839 http://dx.doi.org/10.1109/CVPR52733.2024.01839 ］

Yun S J ， Oh S J and Heo B . 2020 . VideoMix： Rethinking data augmentation for video classification ［EB/OL］. ［ 2020-12-07 ］. https://arxiv.org/pdf/2012.03457.pdf https://arxiv.org/pdf/2012.03457.pdf

Yurtsever E ， Yang D ， Ibrahim M K and Keith A R . 2022 . Photorealism in driving simulations： blending generative adversarial image synthesis with rendering . IEEE Transactions on Intelligent Transportation Systems 23 ： 23114 – 23123 ［ DOI： 10.1109/TITS.2022.3193347 http://dx.doi.org/10.1109/TITS.2022.3193347 ］

Zamir S W ， Arora A ， Khan S ， Hayat M ， Khan F S and Yang M H . 2022 . Restormer： Efficient transformer for high-resolution image restoration // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . New Orleans， LA， USA ： IEEE： 5718 – 5729 ［ DOI： 10.1109/CVPR52688.2022.00564 http://dx.doi.org/10.1109/CVPR52688.2022.00564 ］

Zeng A L ， Yang Y H ， Chen W D and Liu W . 2024a . The dawn of video generation： preliminary explorations with SORA-like models ［EB/OL］. ［ 2024-10-10 ］. https://arxiv.org/abs/2410.05227.pdf https://arxiv.org/abs/2410.05227.pdf

Zeng Y ， Wei G ， Zheng J ， Zou J ， Wei Y ， Zhang Y and Li H . 2024b . Make pixels dance： high-dynamic video generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 8850 – 8860 ［ DOI： 10.1109/CVPR52733.2024.00845 http://dx.doi.org/10.1109/CVPR52733.2024.00845 ］

Zhang C ， Zhu L and Zhang S . 2019a . PAC-GAN： an effective pose augmentation scheme for unsupervised cross-view person re-identification ［EB/OL］. ［ 2019-06-05 ］. http://arxiv.org/abs/1906.01792.pdf http://arxiv.org/abs/1906.01792.pdf

Zhang D J ， Wu J Z ， Liu J W ， Zhao R ， Ran L M ， Gu Y C ， Gao D F and Shou M Z . 2024a . Show-1： marrying pixel and latent diffusion models for text-to-video generation . International Journal of Computer Vision ［ DOI： 10.1007/s11263-024-02271-9 http://dx.doi.org/10.1007/s11263-024-02271-9 ］

Zhang H ， Goodfellow I ， Metaxas D ， and others . 2019b . Self-attention generative adversarial networks // Proceedings of the 36th International Conference on Machine Learning . Long Beach， California， USA ： PMLR： 7354 – 7363 ［ DOI： 10.48550/arXiv.1805.08318 http://dx.doi.org/10.48550/arXiv.1805.08318 ］

Zhang H ， Sindagi V and Patel V M . 2020a . Image de-raining using a conditional generative adversarial network . IEEE Transactions on Circuits and Systems for Video Technology ， 30 （ 11 ）： 3943 – 3956 ［ DOI： 10.1109/TCSVT.2019.2920407 http://dx.doi.org/10.1109/TCSVT.2019.2920407 ］

Zhang H ， Xu H ， Xiao Y ， Guo X J and Ma J . 2020b . Rethinking the image fusion： A fast unified image fusion network based on proportional maintenance of gradient and intensity // Proceedings of the AAAI Conference on Artificial Intelligence . New York， USA ： AAAI： 12797 – 12804 ［ DOI： DOI：10.1609/aaai.v34i07.6975 http://dx.doi.org/DOI：10.1609/aaai.v34i07.6975 ］

Zhang H ， Xu T ， Li H S ， Zhang S T ， Wang X G ， Huang X L and Metaxas D . 2017 . StackGAN： Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks // Proceedings of 16th IEEE International Conference on Computer Vision . Venice ： IEEE： 5908 - 5916 ［ DOI： 10.1109/iccv.2017.629 http://dx.doi.org/10.1109/iccv.2017.629 ］

Zhang J B ， Zhang Y F ， Cun X D ， Huang S Y ， Zhang Y P ， Zhao H ， Lu H Y ， Shen X Y . 2023a . T 2 M-GPT： Generating human motion from textual descriptions with discrete representations// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， BC， Canada： IEEE： 14730 - 14740 ［ DOI： 10.48550/arXiv.2301.06052 http://dx.doi.org/10.48550/arXiv.2301.06052 ］

Zhang J ， Sun F W ， Song J ， Von Ancken A and Zhai R . 2018a . Fine-Grained Image Classification via Spatial Saliency Extraction // Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications . Orlando， FL， USA ： IEEE： 249 – 255 ［ DOI： 10.1109/ICMLA.2018.00044 http://dx.doi.org/10.1109/ICMLA.2018.00044 ］

Zhang L ， Rao A and Agrawala M . 2023b . Adding conditional control to text-to-image diffusion models // Proceedings of the IEEE/CVF International Conference on Computer Vision . Paris， France ： IEEE： 3836 – 3847 ［ DOI： 10.1109/ICCV52598.2023.003836 http://dx.doi.org/10.1109/ICCV52598.2023.003836 ］

Zhang M ， Wu J ， Ren Y ， Li M ， Qin J ， Xiao X ， Liu W ， Wang R ， Zheng M and Ma A J . 2023c . Diffusionengine： Diffusion model is scalable data engine for object detection . ［2023-09-07］ . https：//arxiv.org/pdf/2309.03893.pdf https://arxiv.org/pdf/2309.03893.pdf

Zhang Q ， Lin W and Chan A B . 2021a . Cross-view cross-scene multi-view crowd counting // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 557 – 567 ［ DOI： 10.1109/CVPR46437.2021.00062 http://dx.doi.org/10.1109/CVPR46437.2021.00062 ］

Zhang S G ， Zhou M Q ， Wang Y X ， Luo C C ， Wang R Y ， Li Y W ， Yin X C ， Zhang Z X and Peng J R . 2024b . CityX： Controllable Procedural Content Generation for Unbounded 3D Cities ［EB/OL］. ［ 2024-08-06 ］. https://export.arxiv.org/pdf/2407.17572.pdf https://export.arxiv.org/pdf/2407.17572.pdf

Zhang S W ， Wang J Y ， Zhang Y Y ， Zhao K ， Yuan H J ， Qin Z W ， Wang X ， Zhao D L and Zhou J R . 2023d . I2vgen-xl： high-quality image-to-video synthesis via cascaded diffusion models ［EB/OL］. ［ 2023-11-07 ］. https://arxiv.org/abs/2311.04145.pdf https://arxiv.org/abs/2311.04145.pdf

Zhang T Y ， Wang L ， Li H N ， Xiao Y S ， Liang S Y ， Liu A S ， Liu X L and Tao D C . 2024c . LanEvil： Benchmarking the Robustness of Lane Detection to Environmental Illusions . ACM Multimedia 32 ： 5403 – 5412 ［ DOI： 10.1145/3664647.3680761 http://dx.doi.org/10.1145/3664647.3680761 ］

Zhang T Y ， Xie L X ， Wei L H ， Zhuang Z J ， Zhang Y F ， Li B and Tian Q . 2021b . UnrealPerson： an adaptive pipeline towards costless person re-identification // Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 11501 – 11510 ［ DOI： 10.1109/CVPR46437.2021.01134 http://dx.doi.org/10.1109/CVPR46437.2021.01134 ］

Zhang Y F ， Jia G and Li C . 2020c . Self-paced video data augmentation by generative adversarial networks with insufficient samples // Proceedings of the 28th ACM International Conference on Multimedia . Seattle， WA， USA ： SIGMM： 1652 – 1660 ［ DOI： 10.1145/3394171.3414003 http://dx.doi.org/10.1145/3394171.3414003 ］

Zhang Y H ， Zhang J W and Guo X J . 2019c . Kindling the darkness： A practical low-light image enhancer // Proceedings of the 27th ACM International Conference on Multimedia . Nice， France ： ACM： 1632 – 1640 ［ DOI： 10.1145/3343031.3350926 http://dx.doi.org/10.1145/3343031.3350926 ］

Zhang Y W ， Ling J ， Gao K ， Yin J ， Lafleche J F ， Barriuso A ， Torralba A and Fidler S . 2021c . DatasetGAN： Efficient labeled data factory with minimal human effort // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville， TN， USA ： IEEE： 10145 – 10155 ［ DOI： 10.1109/CVPR46437.2021.01001 http://dx.doi.org/10.1109/CVPR46437.2021.01001 ］

Zhang Z ， Hu W B ， Lao YX ， He T and Zhao H S . 2024d . Pixel-GS： density control with pixel-aware gradient for 3D gaussian splatting // Proceedings of the European Conference on Computer Vision . Milan， Italy ： Springer： 145 – 163 ［ DOI： 10.1007/978-3-031-72655-2_19 http://dx.doi.org/10.1007/978-3-031-72655-2_19 ］

Zhao G S ， Hu W B ， Lao Y X ， He T and Zhao H S . 2024 . Drivedreamer-2： LLM-enhanced world models for diverse driving video generation ［EB/OL］. ［ 2024-04-11 ］. https://arxiv.org/abs/2403.06845.pdf https://arxiv.org/abs/2403.06845.pdf

Zhao H F ， Sheng D X ， Bao J M ， Chen D D ， Chen D ， Wen F ， Yuan L ， Liu C X ， Zhou W T ， Chu Q F ， Zhang W W ， and Yu N H . 2023 . X-paste： Revisiting scalable copy-paste for instance segmentation using clip and stable diffusion // Proceedings of the 40th International Conference on Machine Learning . Honolulu， Hawaii， USA ： PMLR： 42098 – 42109 ［ DOI： 10.48550/arXiv.2212.03863 http://dx.doi.org/10.48550/arXiv.2212.03863 ］

Zhao H S ， Jiang L ， Jia J Y ， Torr P and Koltun V . 2021 . Point transformer // Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal， QC， Canada ： IEEE： 16239 – 16248 ［ DOI： 10.1109/ICCV48922.2021.01595 http://dx.doi.org/10.1109/ICCV48922.2021.01595 ］

Zhao K ， Shen L ， Zhang Y Y ， Zhou C H ， Wang T ， Zhang R X ， Ding S H ， Jia W and Shen W . 2022 . Bézierpalm： a free lunch for palmprint recognition // Proceedings of the 7th European Conference On Computer Vision . Tel Aviv， Israel ： Springer： 19 – 36 ［ DOI： 10.1007/978-3-031-19778-9_2 http://dx.doi.org/10.1007/978-3-031-19778-9_2 ］

Zhao Q J ， Jain A K ， Paulter N G and Taylor M . 2012 . Fingerprint image synthesis based on statistical feature models // Proceedings of the 2012 IEEE Fifth International Conference on Biometrics： Theory， Applications and Systems . Arlington， VA， USA ： IEEE： 23 – 30 ［ DOI： 10.1109/BTAS.2012.6374554 http://dx.doi.org/10.1109/BTAS.2012.6374554 ］

Zheng D H ， Zou Y H ， Zhang X W and Bao C L . 2024a . SeNM-VAE： Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder // Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 25889 – 25899 ［ DOI： 10.1109/CVPR52733.2024.02446 http://dx.doi.org/10.1109/CVPR52733.2024.02446 ］

Zheng G H ， Zhou X Y ， Li X Y ， Qi Z H ， Shan Y C and Li X C . 2023a . LayoutDiffusion： Controllable diffusion model for layout-to-image generation // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver， Canada ： IEEE： 22490 – 22499 ［ DOI： 10.1109/CVPR52598.2023.022490 http://dx.doi.org/10.1109/CVPR52598.2023.022490 ］

Zheng Z D ， Yang X D ， Yu Z D ， Zheng L ， Yang Y and Kautz J . 2019 . Joint discriminative and generative learning for person re-identification // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach， CA， USA ： IEEE： 2133 – 2142 ［ DOI： 10.1109/CVPR.2019.00224 http://dx.doi.org/10.1109/CVPR.2019.00224 ］

Zheng Z D ， Zheng L and Yang Y . 2017 . Unlabeled samples generated by GAN improve the person re-identification baseline in vitro // Proceedings of the 2017 IEEE International Conference on Computer Vision . Venice ： IEEE： 3774 – 3782 ［ DOI： 10.1109/ICCV.2017.405 http://dx.doi.org/10.1109/ICCV.2017.405 ］

Zheng Z ， Peng X ， Yang T ， Shen C ， Li S ， Liu H ， Zhou Y ， Li T and You Y . 2024b . Open-sora： Democratizing efficient video production for all. 1 ， no. 3

Zhong Z ， Zheng L ， Kang G L ， Li S and Yang Y . 2020 . Random erasing data augmentation // Proceedings of the AAAI Conference on Artificial Intelligence . New York， New York， USA ： AAAI： 13001 – 13008 ［ DOI： 10.1609/aaai.v34i07.7000 http://dx.doi.org/10.1609/aaai.v34i07.7000 ］

Zhong Z ， Zheng L ， Li S and Yang Y . 2018 . Generalizing a person retrieval model hetero- and homogeneously // Proceedings of the 15th European Conference Computer Vision . Munich， Germany ： Springer International Publishing： 176 – 192 ［ DOI： 10.1007/978-3-030-01261-8_11 http://dx.doi.org/10.1007/978-3-030-01261-8_11 ］

Zhong Z ， Zheng L ， Zheng Z D ， Li S and Yang Y . 2019 . CamStyle： a novel data augmentation method for person re-identification . IEEE Transactions on Image Processing ， 28 ： 1176 – 1190 ［ DOI： 10.1109/TIP.2018.2874313 http://dx.doi.org/10.1109/TIP.2018.2874313 ］

Zhou D H ， Li Y F ， Ma F ， Zhang X and Yang Y . 2024a . MIGC： Multi-instance generation controller for text-to-image synthesis // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： IEEE： 6818 – 6828 ［ DOI： 10.1109/CVPR52733.2024.00651 http://dx.doi.org/10.1109/CVPR52733.2024.00651 ］

Zhou D Q ， Wang W M ， Yan H S ， Lv W W ， Zhu Y Z and Feng J S . 2023a . Magicvideo： efficient video generation with latent diffusion models ［EB/OL］. ［ 2023-05-11 ］. https://arxiv.org/abs/2211.11018.pdf https://arxiv.org/abs/2211.11018.pdf

Zhou D W ， Yang Z X and Yang Y . 2023b . Pyramid diffusion models for low-light image enhancement // Proceedings of the International Joint Conferences on Artificial Intelligence . Macau， China ： ACM： 1795 – 1803 ［ DOI： 10.24963/ijcai.2023/199 http://dx.doi.org/10.24963/ijcai.2023/199 ］

Zhou M Q ， Wang Y X ， Hou J ， Luo C ， Zhang Z X and Peng J R . 2024b . Scenex： procedural controllable large-scale scene generation via large-language models ［EB/OL］. ［ 2024-07-30 ］. https://arxiv.org/pdf/2403.15698.pdf https://arxiv.org/pdf/2403.15698.pdf

Zhou S ， Zhang J Q ， Jiang H ， Lundh T and Ng AY . 2021 . Data augmentation with Mobius transformations . Machine Learning ： Science and Technology ， 2 （ 2 ）： 025016 ［ DOI： 10.1088/2632-2153/abd615 http://dx.doi.org/10.1088/2632-2153/abd615 ］

Zhou X Y ， Lin Z W ， Shan X J ， Wang Y T ， Sun D Q and Yang M H . 2024c . DrivingGaussian： composite gaussian splatting for surrounding dynamic autonomous driving scenes // Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle， WA， USA ： 21634 – 21643 ［ DOI： 10.1109/CVPR52733.2024.02044 http://dx.doi.org/10.1109/CVPR52733.2024.02044 ］

Zhou X Y ， Ran X J ， Xiong Y J ， He J L ， Lin Z W ， Wang Y T ， Sun D Q and Yang M H . 2024d . GALA3D： towards text-to-3D complex scene generation via layout-guided generative gaussian splatting // Proceedings of the Forty-first International Conference on Machine Learning . Vienna， Austria ： PMLR： 62108 – 62118 ［ DOI： 10.48550/arXiv.2402.07207 http://dx.doi.org/10.48550/arXiv.2402.07207 ］

Zhou Y ， Wang Q Y ， Cai Y X and Yang H . 2024e . Allegro： open the black box of commercial-level video generation model ［EB/OL］. ［ 2024-10-20 ］. https://arxiv.org/abs/2410.15458.pdf https://arxiv.org/abs/2410.15458.pdf

Zhu J Y ， Krähenbühl P ， Shechtman E and Efros A A . 2016 . Generative visual manipulation on the natural image manifold // Proceedings of IEEE European Conference on Computer Vision . Amsterdam， Netherlands ： Springer： 597 – 613 ［ DOI： 10.1007/978-3-319-46454-1_36 http://dx.doi.org/10.1007/978-3-319-46454-1_36 ］

Zhu J Y ， Park T ， Isola P and Efros A A . 2017 . Unpaired image-to-image translation using cycle-consistent adversarial networks // Proceedings of IEEE International Conference on Computer Vision . Venice， Italy ： IEEE： 2242 - 2251 ［ DOI： 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ］

Zhu J Z ， Zhuang P Y and Koyejo S . 2024a . HiFA： high-fidelity text-to-3D generation with advanced diffusion guidance // Proceedings of the 2024 International Conference on Learning Representations . Vienna， Austria . ［ DOI： 10.48550/arXiv.2305.18766 http://dx.doi.org/10.48550/arXiv.2305.18766 ］

Zhu J ， Li S ， Liu Y ， Huang P ， Shan J ， Ma H M and Yuan J . 2024b . ODGen： domain-specific object detection data generation with diffusion models // Proceedings of the Advances in Neural Information Processing Systems . Vancouver， Canada . ［ DOI： 10.48550/arXiv.2405.15199 http://dx.doi.org/10.48550/arXiv.2405.15199 ］

Zhu Z H ， Fan Z W ， Jiang Y F and Wang Z Y . 2024c . FSGS： real-time few-shot view synthesis using gaussian splatting // Proceedings of the Computer Vision – ECCV 2024： 18th European Conference . Milan， Italy ： Springer-Verlag， Berlin， Heidelberg： 145 – 163 ［ DOI： 10.1007/978-3-031-72933-1_9 http://dx.doi.org/10.1007/978-3-031-72933-1_9 ］

Zou H ， Zhang H ， Li X G ， Liu J and He Z F . 2018 . Generation textured contact lenses iris images based on 4DCycle-GAN // Proceedings of the 2018 24th International Conference on Pattern Recognition . Beijing， China ： IEEE： 3561 – 3566 ［ DOI： 10.1109/ICPR.2018.8546154 http://dx.doi.org/10.1109/ICPR.2018.8546154 ］

Zou Y Z ， Choi J and Wang Q R . 2023 . Learning representational invariances for data-efficient action recognition . Computer Vision and Image Understanding ， 227 ： 103597 ［ DOI： 10.1016/j.cviu.2022.103597 http://dx.doi.org/10.1016/j.cviu.2022.103597 ］

Zuo J ， Schmid N A and Chen X H . 2007 . On generation and analysis of synthetic iris images . IEEE Transactions on Information Forensics and Security ， 2 （ 1 ）： 77 – 90 . ［ DOI： 10.1109/TIFS.2006.890305 http://dx.doi.org/10.1109/TIFS.2006.890305 ］

文章被引用时，请邮件提醒。

提交

三维步态识别研究进展

基于图像的自动驾驶3D目标检测综述——基准、制约因素和误差分析

跨视角步态识别综述

融合图像增强的正则化相关滤波无人机目标跟踪

融合上下文感知注意力的Transformer目标跟踪方法