双模态域无关提示引导的图像分类域适应
Dual-modality domain-agnostic prompts guided cross-domain image classification
- 2025年30卷第2期 页码:503-517
纸质出版日期: 2025-02-16
DOI: 10.11834/jig.240119
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2025-02-16 ,
移动端阅览
许媛媛, 阚美娜, 山世光, 陈熙霖. 2025. 双模态域无关提示引导的图像分类域适应. 中国图象图形学报, 30(02):0503-0517
Xu Yuanyuan, Kan Meina, Shan Shiguang, Chen Xilin. 2025. Dual-modality domain-agnostic prompts guided cross-domain image classification. Journal of Image and Graphics, 30(02):0503-0517
目的
2
域适应技术旨在利用有标签的源域信息提升无标签目标域上的任务性能。近期,对比语言—图像预训练模型CLIP(contrastive language-image pre-training)展现出了强大的泛化能力,一些研究将其引入到域适应中,以提升模型在目标域上的泛化能力。然而,目前基于CLIP的域适应方法通常只调整文本模态的特征,保持视觉模态的特征不变,从而导致目标域的性能提升受限。为此,提出了双模态域无关提示引导的图像分类域适应方法DDAPs(dual-modality domain-agnostic prompts)。
方法
2
DDAPs引入了双模态提示学习,即通过文本和视觉提示学习微调文本特征和图像特征,协同处理域差异的问题。一方面,DDAPs致力于学习更具判别性的文本和图像特征,使模型在当前下游分类任务上的性能更好;另一方面,DDAPs通过消除源域和目标域之间的域差异,学习域不变的文本和图像特征,以提升模型在目标域上的性能。以上两个目标可通过添加域无关文本提示模块和域无关视觉提示模块,使用分类损失和对齐损失微调CLIP来实现。对于分类损失,DDAPs利用源域的标签和目标域的伪标签对样本进行分类;而对于对齐损失,DDAPs则通过最大均值差异损失(maximum mean discrepancy,MMD)来对齐源域和目标域的图像特征分布,从而消除图像特征的域差异。
结果
2
本方法既适用于单源域适应,也适用于多源域适应。对于单源域适应,本方法在Office-Home、VisDa-2017及Office-31这3个数据集上进行了实验,分别取得了87.1%、89.6%和91.6%的平均分类准确率,达到了当前最好的性能;对于多源域适应,本方法在Office-Home上进行了实验,取得了88.6%的平均分类准确率。同时,在Office-Home上进行了消融实验,验证了域无关文本提示模块和域无关视觉提示模块的有效性。
结论
2
DDAPs通过域无关的文本和视觉提示模块微调CLIP预训练模型,使模型学习源域与目标域之间域不变且判别性的特征,有效提升了模型在目标域上的性能表现。
Objective
2
Domain adaptation aims to utilize information from a labeled source domain to assist tasks in the unlabeled target domain. Recently, contrastive language-image pre-training (CLIP) has demonstrated impressive generalization capabilities in classification downstream tasks. Some methods have incorporated CLIP into domain adaptation, enhancing the model’s generalization ability in the target domain. However, current domain adaptation methods based on CLIP typically adjust only the features of the textual modality, leaving the visual modality features unchanged. These existing methods overlook the importance of enhancing the discriminative capability of image features during classification and neglect the synergistic role of the visual modality in eliminating domain discrepancy. This issue is addressed by introducing a domain adaptation method for the image classification task guided by dual-modality domain-agnostic prompts (DDAPs).
Method
2
DDAPs introduces dual-modality prompt learning, simultaneously fine-tunes textual and visual features, and collaboratively addresses domain discrepancies. The key modules of DDAPs are the domain-agnostic textual prompt module and the domain-agnostic visual prompt module. The former employs textual prompt learning techniques to fine-tune the text encoder, fostering domain-agnostic and discriminative text features across domains. DDAPs adopts task-level text prompt learning, sharing the textual prompt module across various domains and categories. Similarly, the domain-agnostic visual prompt module uses visual prompt learning techniques to enhance the image encoder, cultivating domain-agnostic and discriminative image features. Task-level visual prompt learning is employed, ensuring that the visual prompt module is shared across diverse domains and samples. The added DDAPs were learned via classification loss and alignment loss to fine-tune the model. On the one hand, as the original pre-training task for CLIP involves matching paired images and text, it needs to learn more discriminative text and image features specific to the current downstream classification task. Therefore, DDAPs uses classification loss to train the added dual-modality domain-invariant prompt modules, enhancing the discriminative power of the features. For the source domain, the classification loss can directly use the existing labels, whereas for the target domain, the classification loss can use the collected pseudo labels. On the other hand, given the considerable visual differences between the images of the two domains, the extracted image representations contain domain-specific features. The target domain is encouraged to fully utilize the beneficial information from the source domain. Therefore, the maximum mean discrepancy loss of DDAPs is used to align the image feature distributions of the source and target domains, and domain-invariant and image features are learned. In image feature distribution alignment, the fusion results of image features and classification probabilities are aligned to enhance the discriminative capability of the aligned features and reduce incorrect category matching between the source and target domains.
Result
2
The experiments encompass three datasets: Office-Home, VisDa-2017, and Office-31. During training, all weights of the CLIP pre-trained model remain fixed, with only the weights of the newly added domain-invariant textual and visual prompt learning modules being updated. Experiments on single-source domain adaptation were conducted across these three datasets to assess DDAPs against existing methods. Average classification accuracies of 87.1%, 89.6%, and 91.6% were obtained, indicating current state-of-the-art performance. Additionally, the versatility of DDAPs extended to multi-source domain adaptation, where it achieved an average classification accuracy of 88.6% on the Office-Home dataset. Ablation studies on the Office-Home dataset further confirm the importance of the domain-agnostic text prompt module and the domain-agnostic visual prompt module. Notably, the comprehensive version of DDAPs exceled, surpassing the performance of individually added single-modality prompt modules, and showed a 5% improvement over the CLIP pre-trained model. This situation underscores the effectiveness of employing dual-modality domain-agnostic prompts to collectively mitigate domain discrepancy. Moreover, experiments explored the sensitivity of hyperparameters. In the proposed DDAPs method, the primary hyperparameters include the weight of the alignment loss and the lengths of the prompt vectors. The findings reveal that when the weight of the alignment loss approaches its optimal value, the performance of the target domain remains stable. Similarly, variations in the lengths of prompt vectors do not significantly affect DDAPs performance. For a more intuitive grasp of DDAP, this study also employs t-distributed stochastic neighbor embedding to visualize the image features of different models, and the visualization demonstrates the superiority of DDAPs in addressing domain adaptation problems.
Conclusion
2
This study introduces a domain adaptation method called DDAPs for image classification tasks. DDAPs uses domain-invariant textual and visual prompts to eliminate domain discrepancies between the source and target domains collaboratively and learns domain-invariant and discriminative images and text features to enhance model performance in the target domain. DDAPs can be applied to both single-source domain adaptation and multi-source domain adaptation. The proposed DDAPs has been experimentally validated across multiple datasets, achieving state-of-the-art results and demonstrating the significance of collaborative handling of domain discrepancy from a dual-modality perspective.
Chen H R , Han X T , Wu Z X and Jiang Y G . 2023 . Multi-prompt alignment for multi-source unsupervised domain adaptation // Proceedings of the 37th International Conference on Neural Information Processing Systems . New Orleans, USA : Curran Associates Inc.
Chen X Y , Wang S N , Long M S and Wang J M . 2019 . Transferability vs. discriminability: batch spectral penalization for adversarial domain adaptation // Proceedings of the 36th International Conference on Machine Learning . Long Beach, USA : PMLR: 1081 - 1090
Chen Y C , Li L J , Yu L C , El Kholy A , Ahmed F , Gan Z , Cheng Y and Liu J J . 2020 . UNITER: universal image-text representation learning // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 104 - 120 [ DOI: 10.1007/978-3-030-58577-8_7 http://dx.doi.org/10.1007/978-3-030-58577-8_7 ]
Dosovitskiy A , Beyer L , Kolesnikov A , Weissenborn D , Zhai X H , Unterthiner T , Dehghani M , Minderer M , Heigold G , Gelly S , Uszkoreit J and Houlsby N . 2020 . An image is worth 16 × 16 words: transformers for image recognition at scale [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2010.11929.pdf https://arxiv.org/pdf/2010.11929.pdf
Gan Y L , Bai Y , Lou Y H , Ma X Z , Zhang R R , Shi N and Luo L . 2023 . Decorate the newcomers: visual domain prompt for continual test time adaptation // Proceedings of the 37th AAAI Conference on Artificial Intelligence . Washington, USA : AAAI Press: 7595 - 7603 [ DOI: 10.1609/aaai.v37i6.25922 http://dx.doi.org/10.1609/aaai.v37i6.25922 ]
Ganin Y and Lempitsky V . 2015 . Unsupervised domain adaptation by backpropagation // Proceedings of the 32nd International Conference on Machine Learning . Lille, France : JMLR.org: 1180 - 1189
Gao T Y , Fisch A and Chen D Q . 2021 . Making pre-trained language models better few-shot learners // Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing . Association for Computational Linguistics : Virtual Event: 3816 - 3830
Ge C J , Huang R , Xie M X , Lai Z H , Song S J , Li S and Huang G . 2023 . Domain adaptation via prompt learning . IEEE Transactions on Neural Networks and Learning Systems , 1 - 11 [ DOI: 10.1109/tnnls.2023.3327962 http://dx.doi.org/10.1109/tnnls.2023.3327962 ]
Gu J D , Han Z , Chen S , Beirami A , He B L , Zhang G Y , Liao R T , Qin Y , Tresp V and Torr P . 2023 . A systematic survey of prompt engineering on vision-language foundation models [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2307.12980.pdf https://arxiv.org/pdf/2307.12980.pdf
He K M , Zhang X Y , Ren S Q and Sun J . 2016 . Deep residual learning for image recognition // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas, USA : IEEE: 770 - 778 [ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Huang Q D , Dong X Y , Chen D D , Zhang W M , Wang F F , Hua G and Yu N H . 2023 . Diversity-aware meta visual prompting // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 10878 - 10887 [ DOI: 10.1109/CVPR52729.2023.01047 http://dx.doi.org/10.1109/CVPR52729.2023.01047 ]
Jia M L , Tang L M , Chen B C , Cardie C , Belongie S , Hariharan B and Lim S N . 2022 . Visual prompt tuning // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 709 - 727 [ DOI: 10.1007/978-3-031-19827-4_41 http://dx.doi.org/10.1007/978-3-031-19827-4_41 ]
Ju C , Han T D , Zheng K H , Zhang Y and Xie W D . 2022 . Prompting visual-language models for efficient video understanding // Proceedings of the 17th European Conference on Computer Vision . Tel Aviv, Israel : Springer: 105 - 124 [ DOI: 10.1007/978-3-031-19833-5_7 http://dx.doi.org/10.1007/978-3-031-19833-5_7 ]
Kang G L , Jiang L , Yang Y and Hauptmann A G . 2019 . Contrastive adaptation network for unsupervised domain adaptation // Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach, USA : IEEE: 4888 - 4897 [ DOI: 10.1109/CVPR.2019.00503 http://dx.doi.org/10.1109/CVPR.2019.00503 ]
Khattak M U , Rasheed H , Maaz M , Khan S and Khan F S . 2023 . MaPLe: multi-modal prompt learning // Proceedings of 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver, Canada : IEEE: 19113 - 19122 [ DOI: 10.1109/cvpr52729.2023.01832 http://dx.doi.org/10.1109/cvpr52729.2023.01832 ]
Kirillov A , Mintun E , Ravi N , Mao H Z , Rolland C , Gustafson L , Xiao T T , Whitehead S , Berg A C , Lo W Y , Dollr P and Girshick R . 2023 . Segment anything [EB/OL]. [ 2023-04-05 ]. https://arxiv.org/pdf/2304.02643.pdf https://arxiv.org/pdf/2304.02643.pdf
Li Y S , Yuan L , Chen Y P , Wang P and Vasconcelos N . 2021 . Dynamic transfer for multi-source domain adaptation // Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Nashville, USA : IEEE: 10993 - 11002 [ DOI: 10.1109/CVPR46437.2021.01085 http://dx.doi.org/10.1109/CVPR46437.2021.01085 ]
Li Y T , Murias M , Major S , Dawson G and Carlson D E . 2018 . Extracting relationships by multi-domain matching // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal, Canada : Curran Associates, Inc.: 6799 - 6810
Li Z P , Zhao Z , Guo Y H , Shen H F and Ye J P . 2020 . Mutual learning network for multi-source domain adaptation [EB/OL]. [ 2020-03-29 ]. https://arxiv.org/pdf/2003.12944.pdf https://arxiv.org/pdf/2003.12944.pdf
Lin T Y , Maire M , Belongie S , Hays J , Perona P , Ramanan D , Dollr P and Zitnick C L . 2014 . Microsoft COCO: common objects in context // Proceedings of the 13th European Conference on Computer Vision . Zurich, Switzerland : Springer: 740 - 755 [ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Long M S , Cao Y , Wang J M and Jordan M I . 2015 . Learning transferable features with deep adaptation networks // Proceedings of the 32nd International Conference on International Conference on Machine Learning . Lille, France : JMLR.org: 97 - 105
Long M S , Cao Z J , Wang J M and Jordan M I . 2018 . Conditional adversarial domain adaptation // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal, Canada : Curran Associates Inc.: 1647 - 1657
Park G Y and Lee S W . 2021 . Information-theoretic regularization for Multi-source Domain Adaptation // Proceedings of 2021 IEEE/CVF International Conference on Computer Vision . Montreal, Canada : IEEE: 9194 - 9203 [ DOI: 10.1109/ICCV48922.2021.00908 http://dx.doi.org/10.1109/ICCV48922.2021.00908 ]
Peng X C , Bai Q X , Xia X D , Huang Z J , Saenko K and Wang B . 2019 . Moment matching for multi-source domain adaptation // Proceedings of 2019 IEEE/CVF International Conference on Computer Vision . Seoul, Korea (South) : IEEE: 1406 - 1415 [ DOI: 10.1109/ICCV.2019.00149 http://dx.doi.org/10.1109/ICCV.2019.00149 ]
Peng X C , Li Y C and Saenko K . 2020 . Domain2Vec: domain embedding for unsupervised domain adaptation // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 756 - 774 [ DOI: 10.1007/978-3-030-58539-6_45 http://dx.doi.org/10.1007/978-3-030-58539-6_45 ]
Peng X C , Usman B , Kaushik N , Hoffman J , Wang D Q and Saenko K . 2017 . VisDA: the visual domain adaptation challenge [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/1710.06924.pdf https://arxiv.org/pdf/1710.06924.pdf
Radford A , Kim J W , Hallacy C , Ramesh A , Goh G , Agarwal S , Sastry G , Askell A , Mishkin P , Clark J , Krueger G and Sutskever I . 2021 . Learning transferable visual models from natural language supervision // Proceedings of the 38th International Conference on Machine Learning . Virtual Event : PMLR: 8748 - 8763
Saenko K , Kulis B , Fritz M and Darrell T . 2010 . Adapting visual category models to new domains // Proceedings of the 11th European Conference on Computer Vision . Heraklion, Greece : Springer: 213 - 226 [ DOI: 10.1007/978-3-642-15561-1_16 http://dx.doi.org/10.1007/978-3-642-15561-1_16 ]
Shen S , Yang S J , Zhang T J , Zhai B H , Gonzalez J E , Keutzer K and Darrell T . 2024 . Multitask vision-language prompt tuning // Proceedings of 2024 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa, USA : IEEE: 5644 - 5655 [ DOI: 10.1109/WACV57701.2024.00556 http://dx.doi.org/10.1109/WACV57701.2024.00556 ]
Shu M L , Nie W L , Huang D A , Yu Z D , Goldstein T , Anandkumar A and Xiao C W . 2022 . Test-time prompt tuning for zero-shot generalization in vision-language models // Proceedings of the 36th International Conference on Neural Information Processing Systems . New Orleans, USA : Curran Associates Inc.: 14274 - 14289
Singha M , Pal H , Jha A and Banerjee B . 2023 . AD-CLIP: adapting domains in prompt space using CLIP // Proceedings of 2023 IEEE/CVF International Conference on Computer Vision Workshops . Paris, France : IEEE: 4357 - 4366 [ DOI: 10.1109/ICCVW60793.2023.00470 http://dx.doi.org/10.1109/ICCVW60793.2023.00470 ]
Sun B C , Feng J S and Saenko K . 2017 . Correlation alignment for unsupervised domain adaptation //Csurka G, ed. Domain Adaptation in Computer Vision Applications . Cham, Germany : Springer: 153 - 171 [ DOI: 10.1007/978-3-319-58347-1_8 http://dx.doi.org/10.1007/978-3-319-58347-1_8 ]
Sun T , Lu C , Zhang T S and Ling H B . 2022 . Safe self-refinement for transformer-based domain adaptation // Proceedings of 2022 IEEE/CVF conference on computer vision and pattern recognition . New Orleans, USA : IEEE: 7181 - 7190 [ DOI: 10.1109/cvpr52688.2022.00705 http://dx.doi.org/10.1109/cvpr52688.2022.00705 ]
Tzeng E , Hoffman J , Saenko K and Darrell T . 2017 . Adversarial discriminative domain adaptation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 2962 - 2971 [ DOI: 10.1109/CVPR.2017.316 http://dx.doi.org/10.1109/CVPR.2017.316 ]
Van Der Maaten L and Hinton G . 2008 . Visualizing data using t-SNE . Journal of Machine Learning Research , 9 ( 86 ): 2579 - 2605
Venkateswara H , Eusebio J , Chakraborty S and Panchanathan S . 2017 . Deep hashing network for unsupervised domain adaptation // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Honolulu, USA : IEEE: 5385 - 5394 [ DOI: 10.1109/CVPR.2017.572 http://dx.doi.org/10.1109/CVPR.2017.572 ]
Wang H Y , Cheng Y H and Wang X S . 2023 . Correlation subdomain alignment network based cross-domain hyperspectral image classification method . Journal of Image and Graphics , 28 ( 10 ): 3255 - 3266
王浩宇 , 程玉虎 , 王雪松 . 2023 . 关联子域对齐网络的跨域高光谱图像分类 . 中国图象图形学报 , 28 ( 10 ): 3255 - 3266 [ DOI: 10.11834/jig.220763 http://dx.doi.org/10.11834/jig.220763 ]
Wu J Y , Li X H , Wei C , Wang H Y , Yuille A , Zhou Y Y and Xie C H . 2022 . Unleashing the power of visual prompting at the pixel level [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2212.10556.pdf https://arxiv.org/pdf/2212.10556.pdf
Xu R J , Chen Z L , Zuo W M , Yan J J and Lin L . 2018 . Deep cocktail network: multi-source unsupervised domain adaptation with category shift // Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition . Salt Lake City, USA : IEEE: 3964 - 3973 [ DOI: 10.1109/CVPR.2018.00417 http://dx.doi.org/10.1109/CVPR.2018.00417 ]
Xu T , Wang L , Ning W , Lyu C Y , Wang K J and Wang C H . 2022a . Joint attention-driven domain fusion and noise-tolerant learning for multi-source domain adaptation [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2208.02947.pdf https://arxiv.org/pdf/2208.02947.pdf
Xu T K , Chen W H , Wang P C , Wang F , Li H and Jin R . 2021 . Cdtrans: cross-domain transformer for unsupervised domain adaptation [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2109.06165.pdf https://arxiv.org/pdf/2109.06165.pdf
Xu Y Y , Kan M N , Shan S G and Chen X L . 2022b . Mutual learning of joint and separate domain alignments for multi-source domain adaptation // Proceedings of 2022 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa, USA : IEEE: 1658 - 1667 [ DOI: 10.1109/wacv51458.2022.00172 http://dx.doi.org/10.1109/wacv51458.2022.00172 ]
Yan H , Liu Y L , Jin L W and Bai X . 2023 . The development, application, and future of LLM similar to ChatGPT . Journal of Image and Graphics , 28 ( 9 ): 2749 - 2762
严昊 , 刘禹良 , 金连文 , 白翔 . 2023 . 类ChatGPT大模型发展、应用和前景 . 中国图象图形学报 , 28 ( 9 ): 2749 - 2762 [ DOI: 10.11834/jig.230536 http://dx.doi.org/10.11834/jig.230536 ]
Yang J Y , Liu J J , Xu N and Huang J Z . 2023 . TVT: transferable vision transformer for unsupervised domain adaptation // Proceedings of 2023 IEEE/CVF Winter Conference on Applications of Computer Vision . Waikoloa, USA : IEEE: 520 - 530 [ DOI: 10.1109/wacv56688.2023.00059 http://dx.doi.org/10.1109/wacv56688.2023.00059 ]
Yang L L , Balaji Y , Lim S N and Shrivastava A . 2020 . Curriculum manager for source selection in multi-source domain adaptation // Proceedings of the 16th European Conference on Computer Vision . Glasgow, UK : Springer: 608 - 624 [ DOI: 10.1007/978-3-030-58568-6_36 http://dx.doi.org/10.1007/978-3-030-58568-6_36 ]
Yao Y , Zhang A , Zhang Z Y , Liu Z Y , Chua T S and Sun M S . 2021 . CPT: colorful prompt tuning for pre-trained vision-language models [EB/OL]. [ 2024-02-26 ]. https://arxiv.org/pdf/2109.11797.pdf https://arxiv.org/pdf/2109.11797.pdf
Ying X , Liu Z , Zhu J L , Jiang H , Zhang R X and Gao J . 2023 . Meta-optimized multi-adversarial domain adaptation for thyroid ultrasound image . Journal of Image and Graphics , 28 ( 1 ): 234 - 247
应翔 , 刘振 , 朱佳琳 , 姜汉 , 张瑞璇 , 高洁 . 2023 . 甲状腺超声影像的元优化多级对抗域适应网络 . 中国图象图形学报 , 28 ( 1 ): 234 - 247 [ DOI: 10.11834/jig.220454 http://dx.doi.org/10.11834/jig.220454 ]
Zhao H , Zhang S H , Wu G H , Costeira J P , Moura J M F and Gordon G J . 2018 . Adversarial multiple source domain adaptation // Proceedings of the 32nd International Conference on Neural Information Processing Systems . Montréal, Canada : Curran Associates Inc.: 8568 - 8579
Zhao S C , Wang G Z , Zhang S H , Gu Y , Li Y X , Song Z C , Xu P F , Hu R B , Chai H and Keutzer K . 2020 . Multi-source distilling domain adaptation // Proceedings of the 34th AAAI Conference on Artificial Intelligence . New York, USA : AAAI Press: 12975 - 12983 [ DOI: 10.1609/aaai.v34i07.6997 http://dx.doi.org/10.1609/aaai.v34i07.6997 ]
Zhao Z Q , Zheng P , Xu S T and Wu X D . 2019 . Object detection with deep learning: a review . IEEE Transactions on Neural Networks and Learning Systems , 30 ( 11 ): 3212 - 3232 [ DOI: 10.1109/TNNLS.2018.2876865 http://dx.doi.org/10.1109/TNNLS.2018.2876865 ]
Zhou K Y , Yang J K , Loy C C and Liu Z W . 2022 . Conditional prompt learning for vision-language models // Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans, USA : IEEE: 16795 - 16804 [ DOI: 10.1109/CVPR52688.2022.01631 http://dx.doi.org/10.1109/CVPR52688.2022.01631 ]
相关作者
相关机构