Large-scale distributed locality-sensitive hashing for general metric data
A distributed-memory approach for Locality-Sensitive Hashing (LSH) that generalizes to metric spaces using Voronoi diagrams and enables efficient large-scale similarity search.
Abstract Locality-Sensitive Hashing (LSH) is extremely competitive for similarity search, but works under the assumption of uniform access cost to the data, and for just a handful of dissimilarities for which locality-sensitive families are available. In this work we propose Parallel Voronoi LSH, an approach that addresses those two limitations of LSH: it makes LSH efficient for distributed-memory architectures, and it works for very general dissimilarities (in particular, it works for all metric dissimilarities). Each hash table of Voronoi LSH works by selecting a sample of the dataset to be used as seeds of a Voronoi diagram. The Voronoi cells are then used to hash the data. Because Voronoi diagrams depend only on the distance, the technique is very general. Implementing LSH in distributed-memory systems is very challenging because it lacks referential locality in its access to the data: if care is not taken, excessive message-passing ruins the index performance. Therefore, another important contribution of this work is the parallel design needed to allow the scalability of the index, which we evaluate in a dataset of a thousand million multimedia features.
Publication
Eliezer de Souza da Silva, Thiago Teixeira, George Teodoro, Eduardo Valle. Large-scale Distributed Locality-Sensitive Hashing for General Metric Data. International Conference on Similarity Search and Applications (SISAP), 2014, Los Cabos, Mexico.