Review of cross-view image geolocalization methods
- Vol. 29, Issue 9, Pages: 2716-2736(2024)
Published: 16 September 2024
DOI: 10.11834/jig.230585
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 September 2024 ,
移动端阅览
盛怡宁, 赵理君, 张正, 崔绍龙, 饶梦彬, 唐娉. 2024. 跨视角图像地理定位方法综述. 中国图象图形学报, 29(09):2716-2736
Sheng Yining, Zhao Lijun, Zhang Zheng, Cui Shaolong, Rao Mengbin, Tang Ping. 2024. Review of cross-view image geolocalization methods. Journal of Image and Graphics, 29(09):2716-2736
跨视角图像地理定位旨在通过图像匹配和地理坐标估计实现不同视角图像之间的准确对应和地理定位,广泛应用于机器人导航、自动驾驶和三维重建等领域。传统的单一视角图像地理定位方法通常受限于数据集质量和规模等因素,定位精度较低。为克服这些局限,近年来研究人员提出了一系列跨视角图像地理定位方法,同时利用多个视角的图像数据,通过视角比较和匹配提高定位精度。跨视角图像匹配方法呈现多元的分类体系。根据面向的跨视角图像类型的不同,可将其分为面向地面—卫星图像的方法与面向无人机—卫星图像的方法两类。根据图像特征提取与表达方式的不同,又可将其分为基于人工设计特征的方法与基于深度神经网络自学习特征的方法两类,对于后者,还可根据是否采用视角对齐方法以及所采用对齐方法的不同将其细分为无视角对齐处理的跨视角图像地理定位、基于传统图像变换的跨视角图像地理定位和基于图像生成的跨视角图像地理定位等3类。本综述对以上方法进行了介绍并比较了它们的优缺点;此外,还总结了常用于跨视角图像地理定位的数据集和评价方法;最后,展望了跨视角图像地理定位的应用领域和未来发展方向。尽管跨视角地理定位方法已取得突破和进展,但仍面临一些问题和挑战。因此,本综述提出了可能的解决方向和未来研究的重点,以期推动该领域的发展和创新。
The research field of cross-view image geolocalization aims to determine the geographic location of images obtained from various viewpoints or perspectives to provide technical support for subsequent tasks, such as automatic driving, robot navigation, and three-dimensional reconstruction. This field involves matching images captured from different views, such as satellite and ground-level images, to accurately estimate their geographical coordinates. Cross-view image geolocalization presents difficulty due to differences in viewpoint, scale, illumination, and appearance among images. This process requires addressing the problems of viewpoint variation, geometric transformations, and handling the large search space of possible matching locations. Early studies on image geolocalization were mainly based on single-view images. Single-view image geolocalization can obtain the geolocation information of a given image by searching for the same-view reference image with prelabeled geolocation information from the image database. However, the traditional single-view image geolocalization method is usually limited by the quality and scale of the dataset, and thus, the positioning accuracy is usually low. To overcome these limitations, the researchers have proposed a series of cross-view image geolocalization methods that utilize image data from multiple perspectives to increase the positioning accuracy through the comparison and matching various perspectives. Given the complexity of geolocalization tasks and solutions, existing methods of cross-view image geolocalization can be classified in multiple ways. This review introduces various classification methods of cross-view image geolocalization and representative methods for each type, and compares their advantages and disadvantages. On the one hand, the diversification of platforms and the increase in multisource data provide more source data choices for cross-view image geolocalization. Based on the differences in matching image sources, cross-view image geolocalization methods can be classified into ground-satellite image- and drone-satellite image-oriented methods. Ground-satellite image-oriented geolocalization conducts image geolocalization on a satellite image based on a ground-view image to be queried. Although ground-satellite geolocalization has various application prospects, a huge visual difference exists between ground- and satellite-view images due to the large angle change, and thus, the matching task encounters difficulty. The drone-satellite geolocalization task, despite being a relatively new method of cross-view image geolocalization, is receiving increasing attention. Unlike the ground image, the drone experiences less occlusion, covers more scenes, and is found near the satellite perspective. The release of University-1652, a geolocalization dataset containing drone, ground, and satellite images, provides data support for related research. On the other hand, feature extraction can be used to solve the geographic location problem of horizontal images. Based on the diverse methods of image feature extraction and expression, cross-view image geolocation methods can be classified into those that are based on artificially designed features and those based on self-learning features of deep neural networks. The former mainly comprise methods based on hand-crafted feature descriptors, such as scale-invariant feature transform, speeded-up robust features, and oriented FAST and rotated BRIEF, which can often be used for similarity measurement using Euclidean or cosine distance or be directly inputted into machine learning models, such as support vector machines and random forest models. Nevertheless, methods belonging to this category exhibit a weak robustness, cannot be finetuned for specific tasks, and have limited accuracy. With the rise of deep learning and the release of large annotated datasets, such as CVUSA and CVACT, deep neural networks have been applied to cross-view image geolocation. Based on whether view alignment is incorporated and the manner of its implementation, methods based on self-learning features of deep neural networks can be subdivided into three categories, namely, those without view alignment processing, those with a view alignment based on traditional image transformations, and those with a view alignment based on image generation. Methods without a view alignment processing focus on end-to-end learning of image feature representation with sufficient discriminative capability, and deep neural networks are mainly based on convolutional neural networks and attention mechanisms. This kind of method is dedicated to making full use of content information in images but often ignores the spatial relationship between images of different views (such as ground and aerial views). This defect is compensated by methods with view alignment based on traditional image transformations. Traditional image-transforming methods were used to explicitly provide additional spatial information for input images, which narrows the domain gap between cross-view images. This kind of method includes polar coordinate transformation and perspective image transformation. Methods with view alignment based on image generation usually utilize generative neural networks first to generate image samples with realistic view angles and match these generated images with real ones to infer their corresponding geographical positions. The generative adversarial network is a representative method in this category. Apart from the description and categorization of methods, the commonly used datasets, including CVUSA, CVACT, and VIGOR for street view-satellite image matching, University-1652 for ground-drone-satellite image matching, and SUES-200 for drone-satellite image matching, and their characteristics for cross-view image geolocalization are summarized. In addition, this paper summarizes the commonly used metrics for model performance evaluation, including Recall@K, average precision (AP), and Hit Rate-K. The evaluation was based on the performances of CVUSA, CVACT, and University-1625. Finally, this review offers an view on the application areas and future development directions of cross-view image geolocalization. Although this research field has achieved considerable breakthroughs and progress, it still faces certain obstacles and challenges, such as the lack of multimodal datasets, challenges in nonrigid scenarios, and the need for real-time and online geolocation. Possible solutions and future research priorities have been proposed to further promote the development and innovation shown in this field. Such solutions include the creation of multimode geolocalization datasets, combination of multiscale and multiview information to solve the geo-location problem in nonrigid scenes, and fusion of other sensor data to achieve real-time geolocation.
图像地理定位跨视角图像匹配深度学习表征学习视角转换
image geolocalizationcross-viewimage matchingdeep learningrepresentation learningperspective transformation
Arandjelović R and Zisserman A. 2013. All about VLAD//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 1578-1585 [DOI: 10.1109/CVPR.2013.207http://dx.doi.org/10.1109/CVPR.2013.207]
Arandjelović R, Gronat P, Torii A, Pajdla T and Sivic J. 2018. NetVLAD: CNN architecture for weakly supervised place recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6): 1437-1451 [DOI: 10.1109/TPAMI.2017.2711011http://dx.doi.org/10.1109/TPAMI.2017.2711011]
Arandjelović R and Zisserman A. 2012. Three things everyone should know to improve object retrieval//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2911-2918 [DOI: 10.1109/CVPR.2012.6248018http://dx.doi.org/10.1109/CVPR.2012.6248018]
Arandjelović R and Zisserman A. 2015. DisLocation: scalable descriptor distinctiveness for location recognition//Proceedings of the 12th Asian Conference on Computer Vision. Singapore, Singapore: Springer: 188-204 [DOI: 10.1007/978-3-319-16817-3_13http://dx.doi.org/10.1007/978-3-319-16817-3_13]
Bansal M, Daniilidis K and Sawhney H. 2012. Ultra-wide baseline facade matching for geo-localization//Proceedings of 2012 European Conference on Computer Vision: Workshops and Demonstrations. Florence, Italy: Springer: 175-186 [DOI: 10.1007/978-3-642-33863-2_18http://dx.doi.org/10.1007/978-3-642-33863-2_18]
Bansal M, Sawhney H S, Cheng H and Daniilidis K. 2011. Geo-localization of street views with aerial image databases//Proceedings of the 19th ACM International Conference on Multimedia. Scottsdale, USA: Association for Computing Machinery: 1125-1128 [DOI: 10.1145/2072298.2071954http://dx.doi.org/10.1145/2072298.2071954]
Bay H, Tuytelaars T and Van Gool L. 2006. SURF: speeded up robust features//Proceedings of the 9th European Conference on Computer Vision. Graz, Austria : Springer: 404-417 [DOI: 10.1007/11744023_32http://dx.doi.org/10.1007/11744023_32]
Cai S D, Guo Y L, Khan S, Hu J W and Wen G J. 2019. Ground-to-aerial image geo-localization with a hard exemplar reweighting triplet loss//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 8390-8399 [DOI: 10.1109/ICCV.2019.00848http://dx.doi.org/10.1109/ICCV.2019.00848]
Cao S and Snavely N. 2013. Graph-based discriminative learning for location recognition//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 700-707 [DOI: 10.1109/CVPR.2013.96http://dx.doi.org/10.1109/CVPR.2013.96]
Castaldo F, Zamir A, Angst R, Palmieri F and Savarese S. 2015. Semantic cross-view matching//Proceedings of 2015 IEEE International Conference on Computer Vision Workshops. Santiago, Chile: IEEE: 1044-1052 [DOI: 10.1109/ICCVW.2015.137http://dx.doi.org/10.1109/ICCVW.2015.137]
Chen C L, He R and Peng C C. 2022. Development of an online adaptive parameter tuning vSLAM algorithm for UAVs in GPS-denied environments. Sensors, 22(20): #8067 [DOI: 10.3390/s22208067http://dx.doi.org/10.3390/s22208067]
Chen D M, Baatz G, Köser K, Tsai S S, Vedantham R, Pylvanainen T, Roimela K, Chen X, Bach J, Pollefeys M, Girod B and Grzeszczuk R. 2011. City-scale landmark identification on mobile devices//Proceedings of 2011 IEEE Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA: IEEE: 737-744 [DOI: 10.1109/CVPR.2011.5995610http://dx.doi.org/10.1109/CVPR.2011.5995610]
Chopra S, Hadsell R and Lecun Y. 2005. Learning a similarity metric discriminatively, with application to face verification//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA: IEEE: 539-546 [DOI: 10.1109/CVPR.2005.202http://dx.doi.org/10.1109/CVPR.2005.202]
Dai M, Hu J H, Zhuang J D and Zheng E H. 2022. A Transformer-based feature segmentation and region alignment method for UAV-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(7): 4376-4389 [DOI: 10.1109/TCSVT.2021.3135013http://dx.doi.org/10.1109/TCSVT.2021.3135013]
Deng X Q, Zhu Y and Newsam S. 2018. What is it like down there? generating dense ground-level views and image features from overhead imagery using conditional generative adversarial networks//Proceedings of the 26th ACM Sigspatial International Conference on Advances in Geographic Information Systems. Seattle, USA: Association for Computing Machinery: 43-52 [DOI: 10.1145/3274895.3274969http://dx.doi.org/10.1145/3274895.3274969]
Deuser F, Habel K and Oswald N. 2023. Sample4Geo: hard negative sampling for cross-view geo-localisation//Proceedings of 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris, France: IEEE: #1545 [DOI: 10.1109/ICCV51070.2023.01545http://dx.doi.org/10.1109/ICCV51070.2023.01545]
Ding L R, Zhou J, Meng L X and Long Z Y. 2021. A practical cross-view image matching method between UAV and satellite for UAV-based geo-localization. Remote Sensing, 13(1): #47 [DOI: 10.3390/rs13010047http://dx.doi.org/10.3390/rs13010047]
Downes L M, Kim D K, Steiner T J and How J P. 2022. City-wide street-to-satellite image geolocalization of a mobile ground agent//Proceedings of 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Kyoto, Japan: IEEE: 11102-11108 [DOI: 10.1109/IROS47612.2022.9981996http://dx.doi.org/10.1109/IROS47612.2022.9981996]
Feremans C, Labbé M and Laporte G. 2003. Generalized network design problems. European Journal of Operational Research, 148(1): 1-13 [DOI: 10.1016/S0377-2217(02)00404-6http://dx.doi.org/10.1016/S0377-2217(02)00404-6]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Columbus, USA: IEEE: 580-587 [DOI: 10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
Han X F, Leung T, Jia Y Q, Sukthankar R and Berg A C. 2015. MatchNet: unifying feature and metric learning for patch-based matching//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 3279-3286 [DOI: 10.1109/CVPR.2015.7298948http://dx.doi.org/10.1109/CVPR.2015.7298948]
Häne C, Heng L, Lee G H, Fraundorfer F, Furgale P, Sattler T and Pollefeys M. 2017. 3D visual perception for self-driving cars using a multi-camera system: calibration, mapping, localization, and obstacle detection. Image and Vision Computing, 68: 14-27 [DOI: 10.1016/j.imavis.2017.07.003http://dx.doi.org/10.1016/j.imavis.2017.07.003]
Hays J and Efros A A. 2008. IM2GPS: estimating geographic information from a single image//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2008.4587784http://dx.doi.org/10.1109/CVPR.2008.4587784]
He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 770-778 [DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]
He S J. 2021. Research on Cross-View Image Geo-Localization Technology Based on Deep Learning. Wuhan: Huazhong University of Science and Technology
何思瑾. 2021. 基于深度学习的跨视角图像地理定位技术研究. 武汉: 华中科技大学
He Y H, Xiang S M, Kang C C, Wang J and Pan C H. 2016b. Cross-modal retrieval via deep and bidirectional representation learning. IEEE Transactions on Multimedia, 18(7): 1363-1377 [DOI: 10.1109/TMM.2016.2558463http://dx.doi.org/10.1109/TMM.2016.2558463]
Hu S X, Feng M D, Nguyen R M H and Lee G H. 2018. CVM-Net: cross-view matching network for image-based ground-to-aerial geo-localization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE: 7258-7267 [DOI: 10.1109/CVPR.2018.00758http://dx.doi.org/10.1109/CVPR.2018.00758]
Hu S Y and Chang X J. 2020. Multi-view drone-based geo-localization via style and spatial alignment [EB/OL]. [2023-08-29]. https://arxiv.org/pdf/2006.13681.pdfhttps://arxiv.org/pdf/2006.13681.pdf
Huang J Q, Ye D P and Jiang S Z. 2023. Ground-to-aerial image geo-localization with cross view image transformation. Journal of Wuhan University (Natural Science Edition), 69(1): 79-87
黄佳庆, 叶登攀, 江顺之. 2023. 基于跨视角图像转换的地—空图像地理定位. 武汉大学学报(理学版), 69(1): 79-87 [DOI: 10.14188/j.1671-8836.2021.0300http://dx.doi.org/10.14188/j.1671-8836.2021.0300]
Jia D, Zhu N D, Yang N H, Wu S, Li Y X and Zhao M Y. 2019. Image matching methods. Journal of Image and Graphics, 24(5): 677-699
贾迪, 朱宁丹, 杨宁华, 吴思, 李玉秀, 赵明远. 2019. 图像匹配方法研究综述. 中国图象图形学报, 24(5): 677-699 [DOI: 10.11834/jig.180501http://dx.doi.org/10.11834/jig.180501]
Kim H J, Dunn E and Frahm J M. 2017. Learned contextual feature reweighting for image geo-localization//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, USA: IEEE: 3251-3260 [DOI: 10.1109/CVPR.2017.346http://dx.doi.org/10.1109/CVPR.2017.346]
Knopp J, Sivic J and Pajdla T. 2010. Avoiding confusing features in place recognition//Proceedings of the 11th European Conference on Computer Vision —— ECCV 2010. Heraklion, Greece: Springer: 748-761 [DOI: 10.1007/978-3-642-15549-9_54http://dx.doi.org/10.1007/978-3-642-15549-9_54]
Krizhevsky A, Sutskever I and Hinton G E. 2017. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90 [DOI: 10.1145/3065386http://dx.doi.org/10.1145/3065386]
Li Z Y, Zhou W X and Geng W X. 2023. Cross-view image geolocalization combining category filtering and reranking. Bulletin of Surveying and Mapping, (2): 40-45
李子彧, 周维勋, 耿万轩 2023. 联合类别筛选与重排序的交叉视角图像地理定位. 测绘通报, (2): 40-45 [DOI: 10.13474/j.cnki.11-2246.2023.0038http://dx.doi.org/10.13474/j.cnki.11-2246.2023.0038]
Lin J L, Zheng Z D, Zhong Z, Luo Z M, Li S Z, Yang Y and Sebe N. 2022. Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing, 31: 3780-3792 [DOI: 10.1109/TIP.2022.3175601http://dx.doi.org/10.1109/TIP.2022.3175601]
Lin T Y, Belongie S and Hays J. 2013. Cross-view image geolocalization//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, USA: IEEE: 891-898 [DOI: 10.1109/CVPR.2013.120http://dx.doi.org/10.1109/CVPR.2013.120]
Lin T Y, Cui Y, Belongie S and Hays J. 2015. Learning deep representations for ground-to-aerial geolocalization//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 5007-5015 [DOI: 10.1109/CVPR.2015.7299135http://dx.doi.org/10.1109/CVPR.2015.7299135]
Liu L and Li H D. 2019. Lending orientation to neural networks for cross-view geo-localization//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019). Long Beach, USA: IEEE: 5617-5626 [DOI: 10.1109/CVPR.2019.00577http://dx.doi.org/10.1109/CVPR.2019.00577]
Liu L, Li H D and Dai Y C. 2017. Efficient global 2D-3D matching for camera localization in a large-scale 3D map//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 2391-2400 [DOI: 10.1109/ICCV.2017.260http://dx.doi.org/10.1109/ICCV.2017.260]
Liu L, Li H D and Dai Y C. 2019. Stochastic attraction-repulsion embedding for large scale image localization//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019). Seoul, Korea (South): IEEE: 2570-2579 [DOI: 10.1109/ICCV.2019.00266http://dx.doi.org/10.1109/ICCV.2019.00266]
Liu L, Li J S, Xu J Q, Yang C F, Qin Y L and Liu Y. 2023. Cross-view image geo-localization review. Artificial Intelligence Security, 2(1): 53-61
刘楝, 李江杉, 许经乾, 杨春芳, 秦雨林, 刘琰. 2023. 跨视角图像地理定位技术研究综述. 智能安全, 2(1): 53-61 [DOI: 10.12407/j.issn.2097-2075.2023.01.053http://dx.doi.org/10.12407/j.issn.2097-2075.2023.01.053]
Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110 [DOI: 10.1023/B:VISI.0000029664.99615.94http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94]
Lu X, Zhu L, Li J J, Zhang H X and Shen H T. 2020. Efficient supervised discrete multi-view hashing for large-scale multimedia search. IEEE Transactions on Multimedia, 22(8): 2048-2060 [DOI: 10.1109/TMM.2019.2947358http://dx.doi.org/10.1109/TMM.2019.2947358]
McManus C, Churchill W, Maddern W, Stewart A D and Newman P. 2014. Shady dealings: robust, long-term visual localisation using illumination invariance//Proceedings of 2014 IEEE International Conference on Robotics and Automation (ICRA). Hong Kong, China: IEEE: 901-906 [DOI: 10.1109/ICRA.2014.6906961http://dx.doi.org/10.1109/ICRA.2014.6906961]
Middelberg S, Sattler T, Untzelmann O and Kobbelt L. 2014. Scalable 6-DOF localization on mobile devices//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer International Publishing: 268-283 [DOI: 10.1007/978-3-319-10605-2_18http://dx.doi.org/10.1007/978-3-319-10605-2_18]
Mirza M and Osindero S. 2014. Conditional generative adversarial nets [EB/OL]. [2023-08-29]. https://arxiv.org/pdf/1411.1784.pdfhttps://arxiv.org/pdf/1411.1784.pdf
Oquab M, Bottou L, Laptev I and Sivic J. 2014. Learning and transferring mid-level image representations using convolutional neural networks//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1717-1724 [DOI: 10.1109/CVPR.2014.222http://dx.doi.org/10.1109/CVPR.2014.222]
Ozcanli O C, Dong Y and Mundy J L. 2016. Geo-localization using volumetric representations of overhead imagery. International Journal of Computer Vision, 116(3): 226-246 [DOI: 10.1007/s11263-015-0850-9http://dx.doi.org/10.1007/s11263-015-0850-9]
Pearl J. 2009. Causality: Models, Reasoning and Inference. 2nd ed. Cambridge, USA: Cambridge University Press
Perronnin F, Liu Y, Snchez J and Poirier H. 2010. Large-scale image retrieval with compressed fisher vectors//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 3384-3391 [DOI: 10.1109/CVPR.2010.5540009http://dx.doi.org/10.1109/CVPR.2010.5540009]
Philbin J, Chum O, Isard M, Sivic J and Zisserman A. 2007. Object retrieval with large vocabularies and fast spatial matching//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2007.383172http://dx.doi.org/10.1109/CVPR.2007.383172]
Radford A, Kim J W, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G and Sutskever I. 2021. Learning transferable visual models from natural language supervision [EB/OL]. [2023-08-29]. https://arxiv.org/pdf/2103.00020.pdfhttps://arxiv.org/pdf/2103.00020.pdf
Regmi K and Shah M. 2019. Bridging the domain gap for ground-to-aerial image matching//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision (ICCV 2019). Seoul, Korea (South): IEEE: 470-479 [DOI: 10.1109/ICCV.2019.00056http://dx.doi.org/10.1109/ICCV.2019.00056]
Rublee E, Rabaud V, Konolige K and Bradski G. 2011. ORB: an efficient alternative to SIFT or SURF//Proceedings of 2011 International Conference on Computer Vision. Barcelona, Spain: IEEE: 2564-2571 [DOI: 10.1109/ICCV.2011.6126544http://dx.doi.org/10.1109/ICCV.2011.6126544]
Sabour S, Frosst N and Hinton G E. 2017. Dynamic routing between capsules//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc: 3859-3869
Schindler G, Brown M and Szeliski R. 2007. City-scale location recognition//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-7 [DOI: 10.1109/CVPR.2007.383150http://dx.doi.org/10.1109/CVPR.2007.383150]
Senlet T and Elgammal A. 2011. A framework for global vehicle localization using stereo images and satellite and road maps//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). Barcelona, Spain: IEEE: 2034-2041 [DOI: 10.1109/ICCVW.2011.6130498http://dx.doi.org/10.1109/ICCVW.2011.6130498]
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R and LeCun Y. 2013. Overfeat: integrated recognition, localization and detection using convolutional networks//Proceedings of the 2nd International Conference on Learning Representations. Banff, Canada: [s.n.]
Shan Q, Wu C C, Curless B, Furukawa Y, Hernandez C and Seitz S M. 2014. Accurate geo-registration by ground-to-aerial image matching//Proceedings of the 2nd International Conference on 3D Vision. Tokyo, Japan: IEEE: 525-532 [DOI: 10.1109/3DV.2014.69http://dx.doi.org/10.1109/3DV.2014.69]
Shechtman E and Irani M. 2007. Matching local self-similarities across images and videos//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA: IEEE: 1-8 [DOI: 10.1109/CVPR.2007.383198http://dx.doi.org/10.1109/CVPR.2007.383198]
Shetty A and Gao G X. 2019. UAV pose estimation using cross-view geolocalization with satellite imagery//Proceedings of 2019 International Conference on Robotics and Automation (ICRA). Montreal, Canada: IEEE: 1827-1833 [DOI: 10.1109/ICRA.2019.8794228http://dx.doi.org/10.1109/ICRA.2019.8794228]
Shi Y J, Liu L, Yu X and Li H D. 2019. Spatial-aware feature aggregation for cross-view image based geo-localization//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Conference Publishing Services: #905
Shi Y J, Yu X, Campbell D and Li H D. 2020a. Where am I looking at? Joint location and orientation estimation by cross-view matching//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 4063-4071 [DOI: 10.1109/CVPR42600.2020.00412http://dx.doi.org/10.1109/CVPR42600.2020.00412]
Shi Y J, Yu X, Liu L, Zhang T and Li H D. 2020b. Optimal feature transport for cross-view image geo-localization//Proceedings of the 34th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI: 11990-11997 [DOI: 10.1609/aaai.v34i07.6875http://dx.doi.org/10.1609/aaai.v34i07.6875]
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: [s. n.]
Sui H G, Liu C, Gan Z, Jiang Z J and Xu C. 2022. Overview of multi-modal remote sensing image matching methods. Acta Geodaetica et Cartographica Sinica, 51(9): 1848-1861
眭海刚, 刘畅, 干哲, 江政杰, 徐川. 2022. 多模态遥感图像匹配方法综述. 测绘学报, 51(9): 1848-1861 [DOI: 10.11947/j.AGCS.2022.20220126http://dx.doi.org/10.11947/j.AGCS.2022.20220126]
Sun B. 2019. Research on Image Geo-Localization Based on Cross-View Matching. Shenzhen: Shenzhen University
孙彬. 2019. 基于跨视角匹配的图像地理位置定位研究. 深圳: 深圳大学
Sun B, Chen C, Zhu Y Y and Jiang J M. 2019. GEOCAPSNET: ground to aerial view image geo-localization using capsule network//Proceedings of 2019 IEEE International Conference on Multimedia and Expo (ICME). Shanghai, China: IEEE: 742-747 [DOI: 10.1109/ICME.2019.00133http://dx.doi.org/10.1109/ICME.2019.00133]
Sun B, Liu G C and Yuan Y. 2023. F3-Net: multiview scene matching for drone-based geo-localization. IEEE Transactions on Geoscience and Remote Sensing, 61: #5610611 [DOI: 10.1109/TGRS.2023.3278257http://dx.doi.org/10.1109/TGRS.2023.3278257]
Suo C G, Zhao J W, Zhang W B, Li P, Huang R J, Zhu J Y and Tan X Y. 2021. Research on UAV three-phase transmission line tracking and localization method based on electric field sensor array. Sensors, 21(24): #8400 [DOI: 10.3390/s21248400http://dx.doi.org/10.3390/s21248400]
Tian X Y, Shao J, Ouyang D Q and Shen H T. 2022. UAV-satellite view synthesis for cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(7): 4804-4815 [DOI: 10.1109/TCSVT.2021.3121987http://dx.doi.org/10.1109/TCSVT.2021.3121987]
Toker A, Zhou Q J, Maximov M and Leal-Taixe L. 2021. Coming down to earth: satellite-to-street view synthesis for geo-localization//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE: 6484-6493 [DOI: 10.1109/CVPR46437.2021.00642http://dx.doi.org/10.1109/CVPR46437.2021.00642]
Torii A, Arandjelović R, Sivic J, Okutomi M and Pajdla T. 2015. 24/7 place recognition by view synthesis//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1808-1817 [DOI: 10.1109/CVPR.2015.7298790http://dx.doi.org/10.1109/CVPR.2015.7298790]
Torii A, Sivic J, Pajdla T and Okutomi M. 2013. Visual place recognition with repetitive structures//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Portland, USA: IEEE: 883-890 [DOI: 10.1109/CVPR.2013.119http://dx.doi.org/10.1109/CVPR.2013.119]
Tuytelaars T and Mikolajczyk K. 2008. Local invariant feature detectors: a survey. Foundations and Trends® in Computer Graphics and Vision, 3(3): 177-280 [DOI: 10.1561/0600000017http://dx.doi.org/10.1561/0600000017]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Kaiser L and Polosukhin I. 2017. Attention is all you need//Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS 2017). Long Beach, USA: Curran Associates Inc: 6000-6010
Viswanathan A, Pires B R and Huber D. 2014. Vision based robot localization by ground to satellite matching in GPS-denied situations//Proceedings of 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. Chicago, USA: IEEE: 192-198 [DOI: 10.1109/IROS.2014.6942560http://dx.doi.org/10.1109/IROS.2014.6942560]
Vo N N and Hays J. 2016. Localizing and orienting street views using overhead imagery//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 494-509 [DOI: 10.1007/978-3-319-46448-0_30http://dx.doi.org/10.1007/978-3-319-46448-0_30]
Wang T, Fan S J, Liu D K and Sun C Y. 2022a. Transformer-guided convolutional neural network for cross-view geolocalization [EB/OL]. [2023-08-29]. https://arxiv.org/pdf/2204.09967.pdfhttps://arxiv.org/pdf/2204.09967.pdf
Wang T Y, Zheng Z D, Yan C G, Zhang J Y, Sun Y Q, Zheng B L and Yang Y. 2022b. Each part matters: local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 867-879 [DOI: 10.1109/TCSVT.2021.3061265http://dx.doi.org/10.1109/TCSVT.2021.3061265]
Wang X L, Zhou J K, Mu N and Wang C. 2023. Cross-view geo-localization method based on multi-task joint learning. Journal of Computer Applications, 43(5): 1625-1635
王先兰, 周金坤, 穆楠, 王晨. 2023. 基于多任务联合学习的跨视角地理定位方法. 计算机应用, 43(5): 1625-1635 [DOI: 10.11772/j.issn.1001-9081.2022040541http://dx.doi.org/10.11772/j.issn.1001-9081.2022040541]
Wang Y J, Li S C, Lin Y H and Wang M J. 2021. Lightweight deep neural network method for water body extraction from high-resolution remote sensing images with multisensors. Sensors, 21(21): #7397 [DOI: 10.3390/s21217397http://dx.doi.org/10.3390/s21217397]
Weyand T, Kostrikov I and Philbin J. 2016. PlaNet-photo geolocation with convolutional neural networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 37-55 [DOI: 10.1007/978-3-319-46484-8_3http://dx.doi.org/10.1007/978-3-319-46484-8_3]
Wilson D, Zhang X H, Sultani W and Wshah S. 2021. Visual and object geo-localization: a comprehensive survey [EB/OL]. [2023-08-29]. https://arxiv.org/pdf/2112.15202.pdfhttps://arxiv.org/pdf/2112.15202.pdf
Workman S, Souvenir R and Jacobs N. 2015. Wide-area image geolocalization with aerial reference imagery//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 3961-3969 [DOI: 10.1109/ICCV.2015.451http://dx.doi.org/10.1109/ICCV.2015.451]
Yang H J, Lu X F and Zhu Y Y. 2021. Cross-view geo-localization with layer-to-layer Transformer//Proceedings of the 35th Conference on Neural Information Processing Systems (NIPS 2021). USA: Curran Associates Inc: 29009-29020
Zamir A R and Shah M. 2010. Accurate image localization based on google maps street view//Proceedings of the 11th European Conference on Computer Vision—ECCV 2010. Heraklion, Greece: Springer: 255-268 [DOI: 10.1007/978-3-642-15561-1_19http://dx.doi.org/10.1007/978-3-642-15561-1_19]
Zamir A R and Shah M. 2014. Image geo-localization based on multiplenearest neighbor feature matching using generalized graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(8): 1546-1558 [DOI: 10.1109/TPAMI.2014.2299799http://dx.doi.org/10.1109/TPAMI.2014.2299799]
Zeng Z L, Wang Z, Yang F and Satoh S. 2023. Geo-localization via ground-to-satellite cross-view image retrieval. IEEE Transactions on Multimedia, 25: 2176-2188 [DOI: 10.1109/TMM.2022.3144066http://dx.doi.org/10.1109/TMM.2022.3144066]
Zhang X H, Li X Y, Sultani W, Zhou Y and Wshah S. 2023. Cross-view geo-localization via learning disentangled geometric layout correspondence//Proceedings of the 37th AAAI Conference on Artificial Intelligence. Washington, USA: AAAI: 3480-3488 [DOI: 10.1609/aaai.v37i3.25457http://dx.doi.org/10.1609/aaai.v37i3.25457]
Zhang Y T, Qian X M, Tan X L, Han J W and Tang Y Y. 2016. Sketch-based image retrieval by salient contour reinforcement. IEEE Transactions on Multimedia, 18(8): 1604-1615 [DOI: 10.1109/TMM.2016.2568138http://dx.doi.org/10.1109/TMM.2016.2568138]
Zhao J W, Zhai Q, Zhao P B, Huang R and Cheng H. 2023. Co-visual pattern-augmented generative Transformer learning for automobile geo-localization. Remote Sensing, 15(9): #2221 [DOI: 10.3390/rs15092221http://dx.doi.org/10.3390/rs15092221]
Zheng Z D, Wei Y C and Yang Y. 2020. University-1652: a multi-view multi-source benchmark for drone-based geo-localization//Proceedings of the 28th ACM International Conference on Multimedia. Seattle, USA: Association for Computing Machinery: 1395-1403 [DOI: 10.1145/3394171.3413896http://dx.doi.org/10.1145/3394171.3413896]
Zhou B L, Lapedriza A, Xiao J X, Torralba A and Oliva A. 2014. Learning deep features for scene recognition using places database//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: MIT Press: 487-495
Zhu C D, Zhu J Q, Bu T X and Gao X F. 2022a. Monitoring and identification of road construction safety factors via UAV. Sensors, 22(22): #8797 [DOI: 10.3390/s22228797http://dx.doi.org/10.3390/s22228797]
Zhu R Z, Yin L, Yang M Z, Wu F, Yang Y C and Hu W B. 2023. SUES-200: a multi-height multi-scene cross-view image benchmark across drone and satellite. IEEE Transactions on Circuits and Systems for Video Technology, 33(9): 4825-4839 [DOI: 10.1109/TCSVT.2023.3249204http://dx.doi.org/10.1109/TCSVT.2023.3249204]
Zhu S J, Shah M and Chen C. 2022b. TransGeo: Transformer is all you need for cross-view image geo-localization//Proceedings of 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New Orleans, USA: IEEE: 1152-1161 [DOI: 10.1109/CVPR52688.2022.00123http://dx.doi.org/10.1109/CVPR52688.2022.00123]
Zhu S J, Yang T J N and Chen C. 2021b. VIGOR: cross-view image geo-localization beyond one-to-one retrieval//Proceedings of 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021. Nashville, USA: IEEE: 5316-5325 [DOI: 10.1109/CVPR46437.2021.00364http://dx.doi.org/10.1109/CVPR46437.2021.00364]
Zhuang J D, Dai M, Chen X R Y and Zheng E H. 2021. A faster and more effective cross-view matching method of UAV and satellite images for UAV geolocalization. Remote Sensing, 13(19): #3979 [DOI: 10. 3390/rs13193979http://dx.doi.org/10.3390/rs13193979]
相关作者
相关机构