Current Issue Cover

刘恒,姚宇,曾玲,陶攀(中国科学院大学, 北京 100049;中国科学院成都计算机应用研究所, 成都 610041)

摘 要
目的 海量图像检索技术是计算机视觉领域研究热点之一,一个基本的思路是对数据库中所有图像提取特征,然后定义特征相似性度量,进行近邻检索。海量图像检索技术,关键的是设计满足存储需求和效率的近邻检索算法。为了提高图像视觉特征的近似表示精度和降低图像视觉特征的存储空间需求,提出了一种多索引加法量化方法。方法 由于线性搜索算法复杂度高,而且为了满足检索的实时性,需把图像描述符存储在内存中,不能满足大规模检索系统的需求。基于非线性检索的优越性,本文对非穷尽搜索的多索引结构和量化编码进行了探索新研究。利用多索引结构将原始数据空间划分成多个子空间,把每个子空间数据项分配到不同的倒排列表中,然后使用压缩编码的加法量化方法编码倒排列表中的残差数据项,进一步减少对原始空间的量化损失。在近邻检索时采用非穷尽搜索的策略,只在少数倒排列表中检索近邻项,可以大大减少检索时间成本,而且检索过程中不用存储原始数据,只需存储数据集中每个数据项在加法量化码书中的码字索引,大大减少内存消耗。结果 为了验证算法的有效性,在3个数据集SIFT、GIST、MNIST上进行测试,召回率相比近几年算法提升4%~15%,平均查准率提高12%左右,检索时间与最快的算法持平。结论 本文提出的多索引加法量化编码算法,有效改善了图像视觉特征的近似表示精度和存储空间需求,并提升了在大规模数据集的检索准确率和召回率。本文算法主要针对特征进行近邻检索,适用于海量图像以及其他多媒体数据的近邻检索。
Applying multi-index additive quantization encoding method for approximate nearest neighbor search

Liu Heng,Yao Yu,Zeng Ling,Tao Pan(University of Chinese Academy of Sciences, Beijing 100049, China;Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041, China)

Objective As the amount of image data produced every day increases, large-scale image retrieval technology is one of the hot topics in the field of computer vision. The basic idea is to extract features from all the images in the database and define the similarity measure to perform nearest neighbor search. The key of massive image retrieval is to design a nearest neighbor search algorithm that can meet efficiency and storage needs. An approximate nearest neighbor search based on multi-index additive quantization method is presented to improve the approximate representation accuracy and reduce the storage space requirements of image visual features. Method If each image is described by a set of local descriptors, then an exhaustive search is prohibitive as we need to index billions of descriptors and perform multiple queries. The image descriptors should be stored in memory to ensure real-time retrieval; however, this approach creates a storage problem. Artificial neural network (ANN) algorithms, which mainly include index structure and quantization methods, are typically compared based on the trade-off between search quality and efficiency. On the basis of the superiority of nonlinear search, we employ an inverted multi-index structure to avoid an exhaustive search. The multi-index structure divides the original data space into multiple subspaces. Each subspace that uses an inverted index stores the list of vectors that lie in the proximity of each codeword as each entry of the multi-index table corresponds to a part of the original vector space and contains a list of points that fall within that part. The purpose of a multi-index structure is to generate a list of data vectors that lie close to any query vector efficiently by only searching in a small dataset in which the near neighbors of the query vector most likely lie, thus ensuring a substantial speed-up over the exhaustive search. As a solution to the storage problem, compact code representations of descriptors are used. Vector quantization is an effective and efficient ANN search method. These methods quantize data by using codewords to reduce the cardinality of the data space. Among the vector quantization methods, additive quantization that approximates the vectors using sums of M codewords from M different codebooks generalizes product quantization and further improves product quantization accuracy while retaining its computational efficiency to a large degree. In this study, we use the additive quantization encoding method to encode the residual data produced by the multi-index structure, which further reduces the quantitative loss of original space. We regard the method mentioned previously as a two-stage quantizer that approximates the residual vector of the preceding stage using one of the centroids in the stage codebook and generates a new residual vector for the succeeding quantization stage. The multi-index structure is used as the first-order quantizer to approximate the vectors, and the additive quantization method is utilized as the second-order quantizer to approximate the residual. The non-exhaustive search strategy retrieves only the near neighbors in a few inverted lists, which can significantly reduce the retrieval time cost. With the use of the additive quantization method in the retrieval process, the original data need not be stored in memory and only the index of the codeword in the codebook should be stored, the sum of which is nearest the data item, significantly reducing memory consumption. Result Experiments on three datasets, i.e., SIFT1M, GIST1M, and MNIST, were conducted to verify the effectiveness of the proposed algorithm. The recall rate of the proposed algorithm is approximately 4% to 15% higher and its average precision is approximately 12% higher than that of existing algorithms. The search time of the proposed algorithm is the same as that of the fastest algorithm. Conclusion An approximate nearest neighbor search based on the multi-index additive quantization method proposed in this study can effectively improve the approximate representation accuracy and reduce the storage space requirements of image visual features. The proposed method also improves retrieval accuracy and recall in large-scale datasets. The proposed algorithm focuses on the nearest neighbor search, which is suitable for large-scale images and other multimedia data.