Elasticsearch Plugin for Nearest Neighbor Search
Methods like word2vec and convolutional neural nets can convert many data modalities (text, images, users, items, etc.) into numerical vectors, such that pairwise distance computations on the vectors correspond to semantic similarity of the original data. Elasticsearch is a ubiquitous search solution, but its support for vectors is limited. This plugin fills the gap by bringing efficient exact and approximate vector search to Elasticsearch. This enables users to combine traditional queries (e.g., “some product”) with vector search queries (e.g., an image (vector) of a product) for an enhanced search experience.
- Datatypes to efficiently store dense and sparse numerical vectors in Elasticsearch documents.
- Exact nearest neighbor queries for five similarity functions: L1, L2, Angular, Jaccard, and Hamming.
- Approximate queries using Locality Sensitive Hashing and related algorithms for all similarities.
- Compose nearest neighbor queries with standard Elasticsearch queries.
- Incrementally build and update your index. Elastiknn doesn’t perform any sort of model fitting, and a vector is just a field in a document. So you can start with 1 vector or 1 million and then create/update/delete documents and vectors without ever re-building the entire index.
- Implemented with standard Elasticsearch and Lucene primitives. Executes entirely in the Elasticsearch JVM. This means deployment is a simple plugin installation and indexing and querying both scale horizontally with Elasticsearch.
- If you need high-throughput nearest neighbor search for periodic batch jobs, there are several faster and simpler methods. Ann-benchmarks is a good place to find them.
- Post and discuss issues on Github.