Query Model for Image Search Based on User Clicks and NN Features

Dmitry Krivokon (speaker), Alexey Gorodilov
Russia, Yandex
We consider a problem of improving the quality of a query-based image search engine by using user click data. The primary purpose of an image search engine (SE) is to assist in finding images that are relevant to the text query entered by a user (such SEs should not be confused with content base image retrieval [10]). The resulting images ideally should be sorted by their relevance in descending order, hence the main task of the SE is to determine the relevance (or rank [3]) of a particular image to the particular query. A lot of information about image relevance to the query can be deduced from the actions performed by a user while browsing the results of the SE. User clicks on a specific result can be considered as a strong signal of the image relevance. The abundance of this data in large SEs leaves a lot of space for different strategies for its adaptation to the ranking problem [2, 4].
We propose to use click data to construct a vector space representation of a query based on the content of the images on which the user clicked viewing the results of the search engine for the query. Document [8] and query models [9] are popular means to solve classification and clustering problems, however, we apply our technique to directly compare a query and some image to understand their “similarity” to each other. To represent the content of an image we use one of the final layers of deep convolutional neural network [6] trained on the standard ImageNet dataset [1]. Essentially this representation is just a 100-dimensional vector of real values. Usage of this type of features became a generally accepted practice in various tasks of image classification and recognition. Besides, they are also generally used as the basis for image search engines that find images visually similar to the “query” image [10]. Such successful applications motivated our approach.
Our query model is constructed by aggregating the feature vectors of
all clicked images for a particular query. Having such a model, which by design resides in the same vector space as the image features used for
its construction, allows for calculation of the direct distance between a query and a specific image. This distance can be used as a feature in a search engine ranker [2] or can be used to re-rank the top ranked images returned by such ranker [2, 5]. Having huge amounts of historical click data allows mitigation of the negative effects of noise that is naturally present in user data and even in the responses of the neural network. In addition, we use not only the features of a specific image on which users clicked but also the features of the image duplicates [7] found in the search engine database. That leads to even better quality of the resulting model. We analyze and compare several aggregation strategies and show the performance of our approach on standard type datasets by measuring NDCG [3] and MSE metrics.
  1. Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.
  2. Jain, Vidit, and Manik Varma. “Learning to re-rank: query-dependent image re-ranking using click data.” Proceedings of the 20th international conference on World wide web. ACM, 2011.
  3. Burges, Christopher JC. “From ranknet to lambdarank to lambdamart: An overview.” Learning 11 (2010): 23-581.
  4. Joachims, Thorsten. “Optimizing search engines using clickthrough data.” Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
  5. Mei, Tao, et al. “Multimedia search reranking: A literature survey.” ACM Computing Surveys (CSUR) 46.3 (2014): 38.
  6. LeCun, Yann, et al. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE 86.11 (1998): 2278-2324.
  7. Ke, Yan, et al. “Efficient near-duplicate detection and sub-image retrieval.” ACM Multimedia. Vol. 4. No. 1. 2004.
  8. Mikolov, Tomas, et al. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781 (2013).
  9. Luo, Cheng, et al. “Query Ambiguity Identification Based on User Behavior Information.” Information Retrieval Technology. Springer International Publishing, 2014. 36-47.
  10. Smeulders, Arnold WM, et al. “Content-based image retrieval at the end of the early years.” Pattern Analysis and Machine Intelligence, IEEE Transactions on 22.12 (2000): 1349-1380.