深度学习的跨视角地理定位方法综述
A survey of cross-view geo-localization methods based on deep learning
- 2024年29卷第12期 页码:3543-3563
纸质出版日期: 2024-12-16
DOI: 10.11834/jig.230858
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2024-12-16 ,
移动端阅览
周博文, 李阳, 马鑫骥, 苗壮, 张睿. 2024. 深度学习的跨视角地理定位方法综述. 中国图象图形学报, 29(12):3543-3563
Zhou Bowen, Li Yang, Ma Xinji, Miao Zhuang, Zhang Rui. 2024. A survey of cross-view geo-localization methods based on deep learning. Journal of Image and Graphics, 29(12):3543-3563
跨视角地理定位技术是计算机视觉领域中的重要问题之一,因其可在缺乏卫星定位环境中实现实时定位,一直受到图像配准、导航定位和图像检索等诸多领域的关注。传统的跨视角地理定位方法采用手工特征进行特征抽取,导致定位精度受限。随着深度学习技术的发展,深度学习的跨视角地理定位方法成为当前的主流技术。但由于跨视角地理定位任务涉及多个步骤、迁移知识广泛,因此本领域仍缺少相关综述。本文首次从跨视角地理定位任务框架的视角,对当前深度学习的跨视角地理定位方法进行全面综述。在问题概述的基础上,对数据预处理、深度学习网络、特征注意力模块和损失函数等技术的发展进行了归纳总结。通过对近百篇高影响力文献的梳理,本文总结出跨视角地理定位任务的特性和改进思路,有助于启发研究者设计新方法。此外,还在两个具有代表性的数据集上分别测试了10种不同深度学习的跨视角地理定位方法。从实验精度、模型的参数量和推理速度3个方面综合评估了现有方法的性能。最后,基于对上述跨视角地理定位方法的归纳分析,本文结合实际应用指出该领域存在的一些问题,并对未来发展趋势进行讨论,希望为该领域感兴趣的学者提供参考。
Cross-view geo-localization aims to estimate a target geographical location by matching images from different viewpoints. This method is usually viewed as an image retrieval task that has been widely adopted in various artificial intelligence tasks, such as person re-identification, vehicle re-identification, and image registration. The main challenge of this localization task lies in the drastic changes among different viewpoints, which reduce the retrieval performance of the model. Conventional techniques for cross-view geo-localization rely on manual feature extraction, which restricts precision when determining location. With the development of deep learning techniques, deep learning-based cross-view geo-localization methods have become the current mainstream technology. However, due to the involvement of multiple steps and the extensive transfer of knowledge in cross-view geo-localization tasks, only a few studies have been conducted in this field. In this paper, we present the first review of cross-view geo-localization methods based on deep learning. We analyze the various developments in data preprocessing, deep learning networks, feature attention modules, and loss functions within the context of cross-view geo-localization tasks. To address the challenges in this field, the data preprocessing phase involves feature alignment, sampling strategies, and data augmentation. Feature alignment serves as prior knowledge for cross-view geo-localization that contributes to improving the localization accuracy. The use of GAN networks has emerged as a prominent trend for feature alignment. Additionally, the discrepancy in sample quantities among satellite, ground, and drone images necessitates the use of effective sampling strategies and data augmentation techniques to achieve training balance. Deep learning networks play a critical role in extracting image features, and their performance directly impacts the accuracy of cross-view geo-localization tasks. In general, the methods that use Transformer as the backbone network have a higher accuracy than those that based on ResNet. Meanwhile, those methods that use the ConvNeXt network show the best performance. To further extract image features and enhance the discriminative power of the model, feature attention modules need to be designed. By learning effective attention mechanisms, these modules adaptively weight the input images or feature maps to improve their focus on task-relevant regions or features. Experimental results show that these modules can explore previously unattended feature information, further extract image features, and enhance the discriminative power of the model. Loss functions are used to improve the fit of the model to the data and to accelerate its convergence. Based on their results, these functions guide the training direction of the entire network based, thus enabling the model to learn better representations and further improve the accuracy of cross-view geo-localization tasks. Some of the most commonly used loss functions include contrastive loss and triplet loss. With the improvement in these loss functions, the number of samples extracted by the model evolves from one-to-one to one-to-many, thus allowing the model to cover all samples during training and further enhance its performance. By analyzing nearly a hundred pieces of influential literature, we summarize the characteristics and propose some ideas for improving cross-view geo-localization tasks, which can inspire researchers to design new methods. We also test 10 deep learning-based cross-view geo-localization methods on 2 representative datasets. This evaluation considers the backbone network type and input data size of these methods. In the University-1652 dataset, we evaluate the accuracy metrics R@1 and AP, the model parameters, and the inference speed. In the CVUSA dataset, we mainly evaluate four accuracy metrics, namely, R@1, R@5, R@10, and R@Top1. Experimental results show that a better backbone network and a large image data input size positively affect the performance of the model. Building upon an extensive review of the current state-of-the-art cross-view geo-localization methods, we also discuss the related challenges and provide several directions for further research on cross-view geo-localization.
跨视角地理定位图像检索深度学习注意力无人机
cross-viewgeo-localizationimage retrievaldeep learningattentiondrone
Alcantarilla P F, Bartoli A and Davison A J. 2012. KAZE features//Proceedings of the 12th European Conference on Computer Vision. Florence, Italy: Springer: 214-227 [DOI: 10.1007/978-3-642-33783-3_16http://dx.doi.org/10.1007/978-3-642-33783-3_16]
Algamdi A M, Sanchez V and Li C T. 2020. Dronecaps: recognition of human actions in drone videos using capsule networks with binary volume comparisons//Proceedings of 2020 IEEE International Conference on Image Processing (ICIP). Abu Dhabi, United Arab Emirates: IEEE: 3174-3178 [DOI: 10.1109/ICIP40778.2020.9190864http://dx.doi.org/10.1109/ICIP40778.2020.9190864]
Arandjelovic R, Gronat P, Torii A, Pajdla T and Sivic J. 2016. NetVLAD: CNN architecture for weakly supervised place recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 5297-5307 [DOI: 10.1109/CVPR.2016.572http://dx.doi.org/10.1109/CVPR.2016.572]
Arar M, Shamir A and Bermano A H. 2022. Learned queries for efficient local attention//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 10841-10852 [DOI: 10.1109/CVPR52688.2022.01057http://dx.doi.org/10.1109/CVPR52688.2022.01057]
Bay H, Ess A, Tuytelaars T and Van Gool L. 2008. Speeded-up robust features (SURF). Computer Vision and Image Understanding, 110(3): 346-359 [DOI: 10.1016/j.cviu.2007.09.014http://dx.doi.org/10.1016/j.cviu.2007.09.014]
Bui D V, Kubo M and Sato H. 2022. A part-aware attention neural network for cross-view geo-localization between UAV and Satellite. Journal of Robotics, Networking and Artificial Life, 9(3): 275-284 [DOI: 10.57417/jrnal.9.3_275http://dx.doi.org/10.57417/jrnal.9.3_275]
Cai S D, Guo Y L, Khan S, Hu J W and Wen G J. 2019. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8391-8400 [DOI: 10.1109/ICCV.2019.00848http://dx.doi.org/10.1109/ICCV.2019.00848]
Cai Y H, Yao Z W, Dong Z, Gholami A, Mahoney M W and Keutzer K. 2020. Zeroq: a novel zero shot quantization framework//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 13169-13178 [DOI: 10.1109/CVPR42600.2020. 01318http://dx.doi.org/10.1109/CVPR42600.2020.01318]
Chen J N, Sun S Y, He J, Torr P, Yuille A and Bai S. 2022a. Transmix: attend to mix for vision Transformers//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 12135-12144 [DOI: 10.1109/CVPR52688. 2022.01182http://dx.doi.org/10.1109/CVPR52688.2022.01182]
Chen W H, Chen X T, Zhang J G and Huang K Q. 2017. Beyond triplet loss: a deep quadruplet network for person re-identification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 403-412 [DOI: 10.1109/CVPR.2017.145http://dx.doi.org/10.1109/CVPR.2017.145]
Chen X C, Li Y, Yao L N, Adeli E and Zhang Y. 2021. Generative adversarial u-net for domain-free medical image augmentation [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2101.04793.pdfhttps://arxiv.org/pdf/2101.04793.pdf
Chen Y D, Wang S, Liu J J, Xu X W, de Hoog F and Huang Z. 2022b. Improved feature distillation via projector ensemble//Proceedings of the 36th International Conference on Neural Information Processing Systems. New Orleans, USA: Curran Associates Inc.: 12084-12095
Choi J and Myung H. 2020. BRM localization: UAV localization in GNSS-denied environments based on matching of numerical map and UAV images//Proceedings of 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, USA: IEEE: 4537-4544 [DOI: 10.1109/IROS45743.2020.9341682http://dx.doi.org/10.1109/IROS45743.2020.9341682]
Choi J, Sharma G, Chandraker M and Huang J B. 2020. Unsupervised and semi-supervised domain adaptation for action recognition from drones//Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision. Snowmass, USA: IEEE: 1717-1726 [DOI: 10.1109/WACV45572.2020.9093511http://dx.doi.org/10.1109/WACV45572.2020.9093511]
Chopra S, Hadsell R and LeCun Y. 2005. Learning a similarity metric discriminatively, with application to face verification//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 539-546 [DOI: 10.1109/CVPR.2005.202http://dx.doi.org/10.1109/CVPR.2005.202]
Cubuk E D, Zoph B, Mane D, Vasudevan V and Le Q V. 2019. AutoAugment: learning augmentation strategies from data//Proceedings of 2019 IEEE/CVF Computer Society Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 113-123 [DOI: 10.1109/CVPR.2019.00020http://dx.doi.org/10.1109/CVPR.2019.00020]
Cubuk E D, Zoph B, Shlens J and Le Q V. 2020. Randaugment: practical automated data augmentation with a reduced search space//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, USA: IEEE: 702-703 [DOI: 10.1109/CVPRW50498.2020.00359http://dx.doi.org/10.1109/CVPRW50498.2020.00359]
Dai M, Hu J H, Zhuang J D and Zheng E H. 2022. A Transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(7): 4376-4389 [DOI: 10.1109/TCSVT.2021.3135013http://dx.doi.org/10.1109/TCSVT.2021.3135013]
Deng B L, Li G Q, Han S, Shi L P and Xie Y. 2020. Model compression and hardware acceleration for neural networks: a comprehensive survey. Proceedings of the IEEE, 108(4): 485-532 [DOI: 10.1109/JPROC.2020.2976475http://dx.doi.org/10.1109/JPROC.2020.2976475]
Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/CVPR.2009.5206848http://dx.doi.org/10.1109/CVPR.2009.5206848]
Deuser F, Habel K and Oswald N. 2023. Sample4geo: hard negative sampling for cross-view geo-localization [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2303.11851.pdfhttps://arxiv.org/pdf/2303.11851.pdf
Ding L R, Zhou J, Meng L X and Long Z Y. 2020. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sensing, 13(1): #47 [DOI: 10.3390/rs13010047http://dx.doi.org/10.3390/rs13010047]
Ding Y, Yu J, Liu B, Hu Y, Cui M X and Wu Q. 2022. MuKEA: multimodal knowledge extraction and accumulation for knowledge-based visual question answering//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 5089-5098 [DOI: 10.1109/CVPR52688.2022.00503http://dx.doi.org/10.1109/CVPR52688.2022.00503]
Dong X Y, Bao J M, Chen D D, Zhang W M, Yu N H, Yuan L, Chen D and Guo B N. 2022. CSWin Transformer: a general vision Transformer backbone with cross-shaped windows//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 12124-12134 [DOI: 10.1109/CVPR52688.2022.01181http://dx.doi.org/10.1109/CVPR52688.2022.01181]
Dong Z, Lin B J and Xie F. 2023. Optimizing remote sensing image scene classification through brain-inspired feature bias estimation and semantic representation analysis. IEEE Access, 11: 34764-34771 [DOI: 10.1109/ACCESS.2023.3264502http://dx.doi.org/10.1109/ACCESS.2023.3264502]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X H, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J and Houlsby N. 2010. An image is worth16x16 words: Transformers for image recognition at scale [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2010.11929.pdfhttps://arxiv.org/pdf/2010.11929.pdf
Ge Y H, Xiao Y, Xu Z, Wang X R and Itti L. 2022. Contributions of shape, texture, and color in visual recognition//Proceedings of the 17th European Conference on Computer Vision. Tel Aviv, Israel: Springer: 369-386 [DOI: 10.1007/978-3-031-19775-8_22http://dx.doi.org/10.1007/978-3-031-19775-8_22]
Gong C Y, Ren T Z, Ye M and Liu Q. 2021. MaxUP: lightweight adversarial training with data augmentation improves neural network training//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 2474-2483 [DOI: 10.1109/CVPR46437.2021.00250http://dx.doi.org/10.1109/CVPR46437.2021.00250]
Goodfellow I J, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 2672-2680
Hadsell R, Chopra S and LeCun Y. 2006. Dimensionality reduction by learning an invariant mapping//Proceedings of 2006 IEEE Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE: 1735-1742 [DOI: 10.1109/CVPR.2006.100http://dx.doi.org/10.1109/CVPR.2006.100]
Hao X J, Liu L, Yang R J, Yin L Z Y, Zhang L and Li X H. 2023. A review of data augmentation methods of remote sensing image target recognition. Remote Sensing, 15(3): #827 [DOI: 10.3390/RS15030827http://dx.doi.org/10.3390/RS15030827]
Harris C and Stephens M. 1988. A combined corner and edge detector//Taylor C J, ed. Proceedings of the Alvey Vision Conference. Manchester, UK: Alvety Vision Club. 1-6 [DOI: 10.5244/C.2.23http://dx.doi.org/10.5244/C.2.23]
Hermans A, Beyer L and Leibe B. 2017. In defense of the triplet loss for person re-identification [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1703.07737.pdfhttps://arxiv.org/pdf/1703.07737.pdf
Hu M, Feng J Y, Hua J S, Lai B S, Huang J Q, Gong X J and Hua X S. 2022b. Online convolutional re-parameterization//Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 568-577 [DOI: 10.1109/CVPR52688. 2022.00065http://dx.doi.org/10.1109/CVPR52688.2022.00065]
Hu S X, Feng M D, Nguyen R M H and Lee G H. 2018. CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7258-7267 [DOI: 10.1109/CVPR.2018.00758http://dx.doi.org/10.1109/CVPR.2018.00758]
Hu S X and Lee G H. 2020. Image-based geo-localization using satellite imagery. International Journal of Computer Vision, 128(5): 1205-1219 [DOI: 10.1007/s11263-019-01186-0http://dx.doi.org/10.1007/s11263-019-01186-0]
Hu W M, Zhang Y C, Liang Y X, Yin Y F, Georgescu A, Tran A, Kruppa H, Ng S K and Zimmermann R. 2022a. Beyond geo-localization: fine-grained orientation of street-view images by cross-view matching with satellite imagery//Proceedings of the 30th ACM International Conference on Multimedia. New York, USA: Association for Computing Machinery: 6155-6164 [DOI: 10.1145/3503161.3548102http://dx.doi.org/10.1145/3503161.3548102]
Huang G S, Zhou Y, Hu X F, Zhao L Y and Zhang C L. 2023. A survey of the research progress in image geo-localization. Journal of Geo-Information Science, 25(7): 1336-1362
黄高爽, 周杨, 胡校飞, 赵璐颖, 张呈龙. 2023. 图像地理定位研究进展. 地球信息科学学报, 25(7): 1336-1362 [DOI: 10.12082/dqxxkx.2023.230073http://dx.doi.org/10.12082/dqxxkx.2023.230073]
Huang T, You S, Zhang B H, Du Y X, Wang F, Qian C and Xu C. 2022. DyRep: bootstrapping training with dynamic re-parameterization//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 588-597 [DOI: 10.1109/CVPR52688.2022.00067http://dx.doi.org/10.1109/CVPR52688.2022.00067]
Kan M N, Shan S G and Chen X L. 2016. Multi-view deep network for cross-view classification//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4847-4855 [DOI: 10.1109/CVPR.2016.524http://dx.doi.org/10.1109/CVPR.2016.524]
Ke Z X, Liu B and Huang X C. 2020. Continual learning of a mixed sequence of similar and dissimilar tasks//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 18493-18504
Kim D K and Walter M R. 2017. Satellite image-based localization via learned embeddings//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore: IEEE: 2073-2080 [DOI: 10.1109/ICRA.2017.7989239http://dx.doi.org/10.1109/ICRA.2017.7989239]
Kinnari J, Verdoja F and Kyrki V. 2021. GNSS-denied geolocalization of UAVs by visual matching of onboard camera images with orthophotos//Proceedings of the 20th International Conference on Advanced Robotics (ICAR). Ljubljana, Slovenia: IEEE: 555-562 [DOI: 10.1109/ICAR53236.2021.9659333http://dx.doi.org/10.1109/ICAR53236.2021.9659333]
Lin J L, Zheng Z D, Zhong Z, Luo Z M, Li S Z, Yang Y and Sebe N. 2022. Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing, 31: 3780-3792 [DOI: 10.1109/TIP.2022.3175601http://dx.doi.org/10.1109/TIP.2022.3175601]
Lin T Y, Cui Y, Belongie S and Hays J. 2015. Learning deep representations for ground-to-aerial geolocalization//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 5007-5015 [DOI: 10.1109/CVPR.2015.7299135http://dx.doi.org/10.1109/CVPR.2015.7299135]
Liu L and Li H D. 2019. Lending orientation to neural networks for cross-view geo-localization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 5624-5633 [DOI: 10.1109/CVPR.2019.00577http://dx.doi.org/10.1109/CVPR.2019.00577]
Liu Y A, Zhang W and Wang J. 2021b. Zero-shot adversarial quantization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Nashville, USA: IEEE: 1512-1521 [DOI: 10.1109/CVPR46437.2021.00156http://dx.doi.org/10.1109/CVPR46437.2021.00156]
Liu Z, Lin Y T, Cao Y, Hu H, Wei Y X, Zhang Z, Lin S and Guo B N. 2021a. Swin Transformer: hierarchical vision Transformer using shifted windows//Proceedings of 2021 IEEE International Conference on Computer Vision. Montreal, Canada: IEEE: 10012-10022 [DOI: 10.1109/ICCV48922.2021.00986http://dx.doi.org/10.1109/ICCV48922.2021.00986]
Liu Z, Mao H Z, Wu C Y, Feichtenhofer C, Darrell T and Xie S N. 2022. A ConvNet for the 2020s//Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 11976-11986 [DOI: 10.1109/CVPR52688.2022.01167http://dx.doi.org/10.1109/CVPR52688.2022.01167]
Lowe D G. 1999. Object recognition from local scale-invariant features//Proceedings of 1999 IEEE International Conference on Computer Vision. Kerkyra, Greece: IEEE: 1150-1157 [DOI: 10.1109/ICCV.1999.790410http://dx.doi.org/10.1109/ICCV.1999.790410]
Lu X F, Luo S Q and Zhu Y Y. 2022. It’s okay to be wrong: cross-view geo-localization with step-adaptive iterative refinement. IEEE Transactions on Geoscience and Remote Sensing, 60: 1-13 [DOI: 10.1109/TGRS.2022.3210195http://dx.doi.org/10.1109/TGRS.2022.3210195]
Luo H Y, Chen T X, Li X J, Li S Y, Zhang C, Zhao G S and Liu X. 2023. KeepEdge: a knowledge distillation empowered edge intelligence framework for visual assisted positioning in UAV delivery. IEEE Transactions on Mobile Computing, 22(8): 4729-4741 [DOI: 10.1109/TMC.2022.3157957http://dx.doi.org/10.1109/TMC.2022.3157957]
Mandlekar A, Nasiriany S, Wen B W, Akinola I, Narang Y, Fan L X, Zhu Y K and Fox D. 2023. MimicGen: a data generation system for scalable robot learning using human demonstrations [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2310.17596.pdfhttps://arxiv.org/pdf/2310.17596.pdf
Matas J, Chum O, Urban M and Pajdla T. 2004. Robust wide-baseline stereo from maximally stable extremal regions. Image and Vision Computing, 22(10): 761-767 [DOI: 10.1016/j.imavis.2004.02.006http://dx.doi.org/10.1016/j.imavis.2004.02.006]
Misra D, Nalamada T, Arasanipalai A U and Hou Q B. 2021. Rotate to attend: convolutional triplet attention module//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, US: IEEE: 3139-3148 [DOI: 10.1109/WACV48630.2021.00318http://dx.doi.org/10.1109/WACV48630.2021.00318]
Naranjo-Alcazar J, Perez-Castanos S, Lopez-Garcia A, Zuccarello P, Cobos M and Ferri F J. 2021. Squeeze-excitation convolutional recurrent neural networks for audio-visual scene classification [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2107.13180.pdfhttps://arxiv.org/pdf/2107.13180.pdf
Ni R K, Goldblum M, Sharaf A, Kong K Z and Goldstein T. 2021. Data augmentation for meta-learning//Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: PMLR: 8152-8161
Pal A, Karkhanis D, Roberts M, Dooley S, Sundararajan A and Naidu S. 2023. Giraffe: adventures in expanding context lengths in LLMs [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2308.10882.pdfhttps://arxiv.org/pdf/2308.10882.pdf
Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2007. Object retrieval with large vocabularies and fast spatial matching//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2007.383172http://dx.doi.org/10.1109/CVPR.2007.383172]
Qin H T, Gong R H, Liu X L, Shen M Z, Wei Z R, Yu F W and Song J K. 2020. Forward and backward information retention for accurate binary neural networks//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 2250-2259 [DOI: 10.1109/CVPR42600.2020.00232http://dx.doi.org/10.1109/CVPR42600.2020.00232]
Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Kruege G and Sutskever I. 2021. Learning transferable visual models from natural language supervision//Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: PMLR: 8748-8763
Rave A, Fontaine P and Kuhn H. 2023. Drone location and vehicle fleet planning with trucks and aerial drones. European Journal of Operational Research, 308(1): 113-130 [DOI: 10.1016/j.ejor.2022.10.015http://dx.doi.org/10.1016/j.ejor.2022.10.015]
Regmi K and Shah M. 2019. Bridging the domain gap for ground-to-aerial image matching//Proceedings of 2019 IEEE International Conference on Computer Vision. Seoul, Korea (South): IEEE: 470-479 [DOI: 10.1109/ICCV.2019.00056http://dx.doi.org/10.1109/ICCV.2019.00056]
Rosten E and Drummond T. 2006. Machine learning for high-speed corner detection//Proceedings of the 9th European Conference on Computer Vision. Graz, Austria: Springer: 430-443 [DOI: 10.1007/11744023\_34]
Schroff F, Kalenichenko D and Philbin J. 2015. FaceNet: a unified embedding for face recognition and clustering//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 815-823 [DOI: 10.1109/CVPR.2015.7298682http://dx.doi.org/10.1109/CVPR.2015.7298682]
Shen T R, Wei Y M, Kang L, Wan S S and Yang Y H. 2024. MCCG: a ConvNeXt-based multiple-classifier method for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 34(3): 1456-1468 [DOI: 10.1109/TCSVT.2023.3296074http://dx.doi.org/10.1109/TCSVT.2023.3296074]
Shi Y J, Liu L, Yu X and Li H D. 2019. Spatial-aware feature aggregation for cross-view image based geo-localization//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.
Shi Y J, Yu X, Campbell D and Li H D. 2020a. Where am I looking at? Joint location and orientation estimation by cross-view matching//Proceedings of 2020 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 4064-4072 [DOI: 10.1109/CVPR42600.2020.00412http://dx.doi.org/10.1109/CVPR42600.2020.00412]
Shi Y J, Yu X, Liu L, Zhang T and Li H D. 2020b. Optimal feature transport for cross-view image geo-localization//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 11990-11997 [DOI: 10.1609/aaai.v34i07.6875http://dx.doi.org/10.1609/aaai.v34i07.6875]
Shu J, Yuan X, Meng D Y and Xu Z B. 2023. DAC-MR: data augmentation consistency based meta-regularization for meta-learning [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2305.07892.pdfhttps://arxiv.org/pdf/2305.07892.pdf
Si Z F and Qi H G. 2023. Survey on knowledge distillation and its application. Journal of Image and Graphics, 28(9): 2817-2832
司兆峰, 齐洪钢. 2023. 知识蒸馏方法研究与应用综述. 中国图象图形学报, 28(9): 2817-2832 [DOI: 10.11834/jig.220273http://dx.doi.org/10.11834/jig.220273]
Song H S, Wang Z, Lei Y, Shi D X, Tong X C, Lei Y X and Qiu C P. 2023. Learning visual representation clusters for cross-view geo-Location. IEEE Geoscience and Remote Sensing Letters, 20: #6011805 [DOI: 10.1109/LGRS.2023. 3326005http://dx.doi.org/10.1109/LGRS.2023.3326005]
Tian X Y, Shao J, Ouyang D Q and Shen H T. 2022. UAV-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(7): 4804-4815 [DOI: 10.1109/TCSVT.2021.3121987http://dx.doi.org/10.1109/TCSVT.2021.3121987]
Toker A, Zhou Q J, Maximov M and Leal-Taixé L. 2021. Coming down to earth: satellite-to-street view synthesis for geo-localization//Proceedings of 2021 IEEE Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 6488-6497 [DOI: 10.1109/CVPR46437.2021.00642http://dx.doi.org/10.1109/CVPR46437.2021.00642]
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A and Jégou H. 2021. Training data-efficient image Transformers & distillation through attention//Proceedings of the 38th International Conference on Machine Learning. Vienna, Austria: PMLR: 10347-10357
Van de Ven G M, Siegelmann H T and Tolias A S. 2020. Brain-inspired replay for continual learning with artificial neural networks. Nature Communications, 11(1): #4069 [DOI: 10.1038/s41467-020-17866-2http://dx.doi.org/10.1038/s41467-020-17866-2]
Van Den Oord A, Li Y Z and Vinyals O. 2018. Representation learning with contrastive predictive coding [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/1807.03748.pdfhttps://arxiv.org/pdf/1807.03748.pdf
Varior R R, Haloi M and Wang G. 2016. Gated Siamese convolutional neural network architecture for human re-identification//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 791-808 [DOI: 10.1007/978-3-319-46484-8\_48]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser Ł and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc.: 5998-6008
Vo N N and Hays J. 2016. Localizing and orienting street views using overhead imagery//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 494-509 [DOI: 10.1007/978-3-319-46448-0_30http://dx.doi.org/10.1007/978-3-319-46448-0_30]
Wang H H, Huang Z Y, Wu X D and Xing E. 2022b. Toward learning robust and invariant representations with alignment regularization and data augmentation//Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, USA: Association for Computing Machinery: 1846-1856 [DOI: 10.1145/3534678.3539438http://dx.doi.org/10.1145/3534678.3539438]
Wang J, Zhou F, Wen S L, Liu X and Lin Y Q. 2017. Deep metric learning with angular loss//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2593-2601 [DOI: 10.1109/ICCV.2017.283http://dx.doi.org/10.1109/ICCV.2017.283]
Wang L Y, Zhang X X, Su H and Zhu J. 2023. A comprehensive survey of continual learning: theory, method and application [EB/OL]. [2023-11-28]. https://arxiv.org/pdf/2302.00487.pdfhttps://arxiv.org/pdf/2302.00487.pdf
Wang T Y, Zheng Z D, Yan C G, Zhang J Y, Sun Y Q, Zheng B L and Yang Y. 2022d. Each part matters: local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 867-879 [DOI: 10.1109/TCSVT.2021.3061265http://dx.doi.org/10.1109/TCSVT.2021.3061265]
Wang T Y, Zheng Z D, Zhu Z J, Gao Y H, Yang Y and Yan C G. 2022a. Learning cross-view geo-localization embeddings via dynamic weighted decorrelation regularization [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2211.05296.pdfhttps://arxiv.org/pdf/2211.05296.pdf
Wang Z F, Zhang Z Z, Lee C Y, Zhang H, Sun R X, Ren X Q, Su G L, Perot V, Dy J and Pfister T. 2022c. Learning to prompt for continual learning//Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 139-149 [DOI: 10.1109/CVPR52688.2022.00024http://dx.doi.org/10.1109/CVPR52688.2022.00024]
Wei H and Wang L P. 2018. Visual navigation using projection of spatial right-angle in indoor environment. IEEE Transactions on Image Processing, 27(7): 3164-3177 [DOI: 10.1109/TIP.2018.2818931http://dx.doi.org/10.1109/TIP.2018.2818931]
Wood D, Mu T T and Brown G. 2022. Bias-variance decompositions for margin losses//Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. Valencia, Spain: PMLR: 1975-2001 [DOI: 10.48550/arXiv.2204.12155http://dx.doi.org/10.48550/arXiv.2204.12155]
Workman S, Souvenir R and Jacobs N. 2015. Wide-area image geolocalization with aerial reference imagery//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3961-3969 [DOI: 10.1109/ICCV.2015.451http://dx.doi.org/10.1109/ICCV.2015.451]
Workman S and Jacobs N. 2015. On the location dependence of convolutional neural network features//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops. Boston, USA: IEEE: 70-78 [DOI: 10.1109/CVPRW. 2015.7301385http://dx.doi.org/10.1109/CVPRW.2015.7301385]
Xu F R, Zhang W, Cheng Y and Chu W. 2020. Metric learning with equidistant and equidistributed triplet-based loss for product image search//Proceedings of 2020 Web Conference. New York, USA: Association for Computing Machinery: 57-65 [DOI: 10.1145/3366423.3380094http://dx.doi.org/10.1145/3366423.3380094]
Yang H J, Lu X F and Zhu Y Y. 2021. Cross-view geo-localization with evolving Transformer//Proceedings of the 35th Conference on Neural Information Processing Systems. Red Hook, USA: Curran Associates Inc.: 29009-29020 [DOI: 10.48550/arXiv.2107.00842http://dx.doi.org/10.48550/arXiv.2107.00842]
Yang Z H, Zhan F N, Liu K H, Xu M Y and Lu S J. 2023. AI-generated images as data source: the dawn of synthetic era [EB/OL]. [2023-11-28]. https://arxiv.org/pdf/2310.01830.pdfhttps://arxiv.org/pdf/2310.01830.pdf
Yin X J, Huang B Z and Wan X J. 2023. Alcuna: large language models meet new knowledge [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2310.14820.pdfhttps://arxiv.org/pdf/2310.14820.pdf
Yue M R, Zhao J, Zhang M, Du L and Yao Z Y. 2023. Large language model cascades with mixture of thoughts representations for cost-efficient reasoning [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2310.03094.pdfhttps://arxiv.org/pdf/2310.03094.pdf
Zhang J Y, Krishna R, Awadallah A H and Wang C. 2023c. EcoAssistant: using LLM assistant more affordably and accurately [EB/OL]. [2023-11-28]. https://arxiv.org/pdf/2310.03046.pdfhttps://arxiv.org/pdf/2310.03046.pdf
Zhang S, Wang Q Z, Bian J and Xiong H Y. 2023b. TiC: exploring vision Transformer in convolution [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2310.04134.pdfhttps://arxiv.org/pdf/2310.04134.pdf
Zhang T Y, Wang Z, Huang J, Tasnim M M and Shi W. 2023e. A survey of diffusion based image generation models: issues and their solutions [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2308.13142.pdfhttps://arxiv.org/pdf/2308.13142.pdf
Zhang X H, Li X Y, Sultani W, Zhou Y and Wshah S. 2023d. Cross-view geo-localization via learning disentangled geometric layout correspondence. Proceedings of 2023 AAAI Conference on Artificial Intelligence. Washington, USA: AAAI: 3480-3488 [DOI: 10.1609/aaai.v37i3.25457http://dx.doi.org/10.1609/aaai.v37i3.25457]
Zhang X H, Sultani W and Wshah S. 2023a. Cross-view image sequence geo-localization//Proceedings of 2023 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, USA: IEEE: 2914-2923 [DOI: 10.1109/WACV56688.2023.00293http://dx.doi.org/10.1109/WACV56688.2023.00293]
Zhao J, Yuan Y S, Zhang P Y and Wang D. 2023. An efficient Transformer-based object-capturing video annotation method. Journal of Image and Graphics, 28(10): 3176-3190
赵洁, 袁永胜, 张鹏宇, 王栋. 2023. 轻量化Transformer目标跟踪数据标注算法. 中国图象图形学报, 28(10): 3176-3190 [DOI: 10.11834/jig.220823http://dx.doi.org/10.11834/jig.220823]
Zhao J W, Zhai Q, Zhao P B, Huang R and Cheng H. 2022. Co-visual pattern augmented generative Transformer learning for automobile geo-localization [EB/OL]. [2023-12-12]. https://arxiv.org/pdf/2203.09135.pdfhttps://arxiv.org/pdf/2203.09135.pdf
Zheng Z D, Wei Y C and Yang Y. 2020. University-1652: a multi-view multi-source benchmark for drone-based geo-localization//Proceedings of the 28th ACM international conference on Multimedia. New York, USA: Association for Computing Machinery: 1395-1403 [DOI: 10.1145/3394171.3413896http://dx.doi.org/10.1145/3394171.3413896]
Zheng Z D, Zheng L and Yang Y. 2017. A discriminatively learned CNN embedding for person reidentification. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 14(1): #13 [DOI: 10.1145/3159171http://dx.doi.org/10.1145/3159171]
Zhou B L, Lapedriza A, Xiao J X, Torralba A and Oliva A. 2014. Learning deep features for scene recognition using places database//Proceedings of the 27th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press: 487-495
Zhou J Z, Yan Y M, Gu G H and Su N. 2023. Multi-scale geo-localization based on local similarity area distance measurement method//IGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium. Pasadena, USA: IEEE: 5069-5072 [DOI: 10.1109/IGARSS52108.2023.10282536http://dx.doi.org/10.1109/IGARSS52108.2023.10282536]
Zhou L P, Huang G Q, Mao Y N, Wang S Z and Kaess M. 2022. EDPLVO: efficient direct point-line visual odometry//Proceedings of 2022 IEEE International Conference on Robotics and Automation. Philadelphia, USA: IEEE: 7559-7565 [DOI: 10.1109/ICRA46639.2022.9812133http://dx.doi.org/10.1109/ICRA46639.2022.9812133]
Zhu P F, Zheng J Y, Du D W, Wen L Y, Sun Y M and Hu Q H. 2021. Multi-drone-based single object tracking with agent sharing network. IEEE Transactions on Circuits and Systems for Video Technology, 31(10): 4058-4070 [DOI: 10.1109/TCSVT.2020.3045747http://dx.doi.org/10.1109/TCSVT.2020.3045747]
Zhu R Z, Yang M Z, Yin L, Wu F and Yang Y C. 2023a. UAV’s status is worth considering: a fusion representations matching method for geo-localization. Sensors, 23(2): #720 [DOI: 10.3390/s23020720http://dx.doi.org/10.3390/s23020720]
Zhu R Z, Yin L, Yang M Z, Wu F, Yang Y C and Hu W B. 2023c. SUES-200: a multi-height multi-scene cross-view image benchmark across drone and satellite. IEEE Transactions on Circuits and Systems for Video Technology, 33(9): 4825-4839 [DOI: 10.1109/TCSVT.2023. 3249204http://dx.doi.org/10.1109/TCSVT.2023.3249204]
Zhu S J, Shah M and Chen C. 2022. TransGeo: Transformer is all you need for cross-view image geo-localization//Proceedings of 2022 IEEE Conference on Computer Vision and Pattern Recognition. New Orleans, USA: IEEE: 1162-1171 [DOI: 10.1109/CVPR52688.2022. 00123http://dx.doi.org/10.1109/CVPR52688.2022.00123]
Zhu S J, Yang T J N and Chen C. 2021. Revisiting street-to-aerial view image geo-localization and orientation estimation//Proceedings of 2021 IEEE Winter Conference on Applications of Computer Vision. Waikoloa, US: IEEE: 756-765 [DOI: 10.1109/WACV48630.2021. 00080http://dx.doi.org/10.1109/WACV48630.2021.00080]
Zhu Y Y, Yang H J, Lu Y X and Huang Q. 2023b. Simple, effective and general: a new backbone for cross-view image geo-localization [EB/OL]. [2023-11-28]. https://arxiv.org/pdf/2302.01572.pdfhttps://arxiv.org/pdf/2302.01572.pdf
Zhuang J D, Chen X R Y, Dai M, Lan W B, Cai Y H and Zheng E H. 2022. A semantic guidance and Transformer-based matching method for UAVs and satellite images for UAV geo-localization. IEEE Access, 10: 34277-34287 [DOI: 10.1109/ACCESS.2022.3162693http://dx.doi.org/10.1109/ACCESS.2022.3162693]
Zhuang J D, Dai M, Chen X R Y and Zheng E H. 2021. A faster and more effective cross-view matching method of UAV and satellite images for UAV geolocalization. Remote Sensing, 13(19): #3979 [DOI: 10.3390/rs13193979http://dx.doi.org/10.3390/rs13193979]
相关作者
相关机构