AiPaper
Status: completed

Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters

Graph-Structured Index ConstructionApproximate Nearest Neighbor Search AlgorithmsFiltered Nearest Neighbor RetrievalLabel-Aware Vector RetrievalEfficient Large-Scale Retrieval Systems
Original Link
Price: 0.10
5 readers
This analysis is AI-generated and may not be fully accurate. Please refer to the original paper.

TL;DR Summary

Filtered-DiskANN introduces two graph-based algorithms for efficient filtered ANNS. They build indexes considering both vector geometry and label metadata, significantly outperforming prior methods in speed and accuracy for filtered queries. This supports high-recall, large-scale

Abstract

As Approximate Nearest Neighbor Search (ANNS)-based dense retrieval becomes ubiquitous for search and recommendation scenarios, efficiently answering filtered ANNS queries has become a critical requirement. Filtered ANNS queries ask for the nearest neighbors of a query’s embedding from the points in the index that match the query’s labels such as date, price range, language. There has been little prior work on algorithms that use label metadata associated with vector data to build efficient indices for filtered ANNS queries. Consequently, current indices have high search latency or low recall which is not practical in interactive web-scenarios. We present two algorithms with native support for faster and more accurate filtered ANNS queries: one with streaming support, and another based on batch construction. Central to our algorithms is the construction of a graph-structured index which forms connections not only based on the geometry of the vector data, but also the associated label set. On real-world data with natural labels, both algorithms are an order of magnitude or more efficient for filtered queries than the current state of the art algorithms. The generated indices also be queried from an SSD and support thousands of queries per second at over 90% recall@10.

English Analysis

1. Bibliographic Information

  • Title: Filtered-DiskANN: Graph Algorithms for Approximate Nearest Neighbor Search with Filters
  • Authors: Siddharth Gollapudi, Neel Karia, Varun Sivashankar, Ravishankar Krishnaswamy, Nikit Begwani, Swapnil Raz, Yiyong Lin, Yin Zhang, Neelam Mahapatro, Premukumar Srinivasan, Amit Singh, and Harsha Vardhan Simhadri. The authors are primarily affiliated with Microsoft Research and Microsoft, with one author from Columbia University. This indicates a strong industry-research collaboration, often focused on solving large-scale, practical problems.
  • Journal/Conference: Proceedings of the ACM Web Conference 2023 (WWW '23). The Web Conference (formerly WWW) is a top-tier, highly competitive venue for research on web technologies, data science, and internet applications. Publication here signifies high quality and impact.
  • Publication Year: 2023
  • Abstract: The paper addresses the growing need for efficient Approximate Nearest Neighbor Search (ANNS) that can handle queries with filters (e.g., search for similar items within a specific price range or language). Existing methods often suffer from high latency or low accuracy (recall), which is unacceptable for interactive applications. The authors introduce two novel graph-based algorithms, FilteredVamana (streaming-friendly) and StitchedVamana (batch-built), that natively incorporate filter information into the index structure itself. By considering both the vector geometry and the associated labels during graph construction, these algorithms significantly outperform state-of-the-art methods on real-world datasets, often by an order of magnitude. The resulting indices are also efficient enough to run on SSDs, supporting thousands of queries per second with high recall.
  • Original Source Link: https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf (Formally published paper)

2. Executive Summary

  • Background & Motivation (Why):

    • Core Problem: Modern search and recommendation systems heavily rely on finding the "nearest" items (e.g., products, documents, images) to a query in a high-dimensional vector space. This is the Approximate Nearest Neighbor Search (ANNS) problem. A critical real-world requirement is to perform this search on a subset of the data that matches certain metadata criteria, or "filters." This is known as Filtered ANNS. For example, finding visually similar dresses that are also under `50 and available in size M.
    • Importance & Gaps: As ANNS becomes standard, the lack of efficient filtering is a major bottleneck. Prior methods are generally inefficient. Post-processing (search first, then filter) fails when the filter is very specific, as it may return no valid results. Pre-processing (building a separate index for every possible filter) is not scalable. Inline-processing (filtering during the search traversal) improves on this but is often applied to less efficient index structures (like IVF) and not the state-of-the-art graph-based indices.
    • Innovation: This paper's central innovation is to integrate filter awareness directly into the graph construction process. Instead of treating filters as an afterthought at query time, the proposed algorithms build a graph where the connections (edges) are chosen based on both vector similarity and shared labels. This ensures that the graph is navigable not just in the geometric space but also within the context of any given filter.
  • Main Contributions / Findings (What):

    1. Two Novel Algorithms: The paper introduces FilteredVamana and StitchedVamana, two graph-based algorithms for filtered ANNS.
      • FilteredVamana is an incremental algorithm, suitable for streaming data and dynamic updates.
      • StitchedVamana is a batch-construction algorithm that builds separate graphs per filter and then intelligently "stitches" them together.
    2. State-of-the-Art Performance: On real-world datasets, both algorithms are shown to be an order of magnitude or more efficient (in terms of query speed for a target accuracy) than existing baselines like HNSW with post-processing, IVF with inline-processing, and other specialized methods like NHQ and Milvus. They achieve high recall (>90\%) even for highly specific filters (e.g., matching only 0.0001% of the data).
    3. Real-World Validation: The effectiveness of FilteredVamana was confirmed in a live A/B test on a major sponsored search engine. It led to significant increases in user clicks (30-70%) and revenue (30-80%), especially for queries targeting smaller, less-represented groups.
    4. Scalability to SSDs: The proposed methods can be integrated into the DiskANN framework, allowing massive datasets (tens of millions of points) to be indexed and queried from cost-effective SSDs at interactive speeds.

3. Prerequisite Knowledge & Related Work

  • Foundational Concepts:
    • Approximate Nearest Neighbor Search (ANNS): Given a large dataset of high-dimensional vectors (e.g., embeddings of images or text), the goal is to find the k vectors from the dataset that are closest to a given query vector. Doing this exactly requires a linear scan, which is too slow for large datasets due to the "curse of dimensionality." ANNS algorithms trade perfect accuracy for a massive speedup by finding vectors that are very likely to be the true nearest neighbors.
    • Graph-based ANNS: A leading class of ANNS algorithms (e.g., HNSW, Vamana). They build a graph where each data point is a node. Edges connect nodes that are close to each other in the vector space. To find neighbors for a query, the algorithm performs a greedy search on this graph, starting from an entry point and iteratively moving to the neighbor closest to the query. A well-constructed graph guides the search quickly to the true nearest neighbors.
    • Filtered ANNS: The problem of performing ANNS on a subset of the data defined by a filter. A query consists of a vectorx_qand a filter labelf. The goal is to find the nearest neighbors ofx_qonly among the data points that have the label`f. * Specificity: A term defined in the paper as the fraction of the total dataset that matches a given filter. A filter with low specificity (e.g., 1%) matches a small subset of the data and is typically harder for existing methods to handle. * Recall@k: A primary metric for ANNS quality. It measures the fraction of the true top-*k* nearest neighbors that the algorithm successfully found. Recall@k=Returned NeighborsTrue Top-k Neighborsk \text{Recall@k} = \frac{|\text{Returned Neighbors} \cap \text{True Top-k Neighbors}|}{k} * Previous Works: * Post-processing: The simplest approach. You run a standard ANNS query to retrieve a large number of candidates (e.g., 1000) and then filter out those that don't match the query's label. Limitation: This fails dramatically for low-specificity filters, as you might need to retrieve millions of candidates to find a few that match. * Pre-processing (Separate Indices): Build a separate ANNS index for each possible filter label. Limitation: This is infeasible if there are many labels or if data points can have multiple labels, as it leads to an explosion in memory and maintenance costs. * Inline-processing: This approach integrates filtering into the search process. For example, in an inverted file index (`IVF`), the data is partitioned into clusters. During a query, only points in the most promising clusters are checked. Inline-processing would skip points within these clusters that don't match the query filter. Limitation: While better than post-processing, this technique has been primarily applied to less performant index structures like `IVF` or `LSH`, not the highly efficient graph-based methods. * NHQ: A recent graph-based method that encodes filter labels as vectors and appends them to the data vectors. Limitation: The paper notes this is less effective when points can have multiple labels and is outperformed by their proposed methods. * Differentiation: The key differentiator of `Filtered-DiskANN` is its filter-aware index construction. Unlike previous methods that only modify the *search* step, these algorithms build a graph whose structure is fundamentally designed to facilitate filtered searches from the ground up. This leads to vastly superior performance because the graph provides direct, efficient paths between points that share the same labels. # 4. Methodology (Core Technology & Implementation) The paper's core technology revolves around modifying the Vamana graph construction algorithm to be filter-aware. * FilteredGreedySearch (Algorithm 1): This is the search algorithm used for both querying and index construction. It's a modification of the standard greedy search on a graph. * Inputs: A graph G, a set of start nodes S, a query vector `xq`, number of neighbors k, search list size L, and a query filter `Fq`. * Procedure: 1. Initialize a priority queue L (the candidate list) with start nodes that match the query filter `Fq`. 2. Initialize an empty set V of visited nodes. 3. While L contains unvisited nodes: a. Select the unvisited node p* in L that is closest to the query `xq`. b. Add p* to the visited set V. c. For each neighbor of p*, if the neighbor matches the query filter `Fq` and has not been visited, add it to L. d. If L exceeds its size limit, truncate it to keep only the L closest nodes to `xq`. 4. The search terminates when all nodes in L have been visited. * Output: The k nearest neighbors from the final candidate list L. * FilteredRobustPrune (Algorithm 3): This is the most critical component. It's the procedure for pruning edges from a node's neighborhood to maintain a fixed maximum degree R while preserving graph navigability for filtered queries. * Principle: An edge from a point p to a candidate neighbor p' can be pruned if there is another neighbor p* of p that provides a "better" path to p'. The conditions for this are: 1. Geometric Condition: p* is significantly closer to p' than p is. Formally, \alpha \cdot d(p^, p') \leq d(p, p'),where, where \alpha \ge 1isadiversityparameter.ThisisthestandardVamanapruningrule.2.<strong>FilterAwareCondition(TheNovelty):</strong>Theintermediatepointpmust"cover"thefilterbasedpath.Thismeansthatpmusthaveallthelabelsthatpandphaveincommon.Formally, is a diversity parameter. This is the standard Vamana pruning rule. 2. <strong>Filter-Aware Condition (The Novelty):</strong> The intermediate point p* must "cover" the filter-based path. This means that p* must have all the labels that p and p' have in common. Formally, F_{p'} \cap F_p \subseteq F_{p^}.Thisisthekeyinsight:itensuresthatifasearchforafilterfcouldhavetraversedtheedge(p,p),itcannowtraversethepath(p,p,...,p)becausetheintermediatenodepalsohasthefilterf.<strong>FilteredVamana(Algorithm4):</strong>Thisalgorithmbuildsthefilteredgraphincrementally.<strong>Steps:</strong>1.<strong>StartPointSelection:</strong>Foreachfilterlabelf,selectarepresentativestartingnodest(f)usingaloadbalancingalgorithm(FindMedoid,Algorithm2)toavoidoverloadinganysinglenode.2.<strong>IncrementalInsertion:</strong>Processdatapointsonebyoneinarandomorder.Foreachpointp:a.RunFilteredGreedySearchforp(usingpsownvectorandlabelsasthequery)tofindasetofcandidateneighbors.Thesearchstartsfromthedesignatedstartpointsforpslabels.b.UseFilteredRobustPrunetoselectthebestRneighborsforpfromthiscandidateset.c.Adddirectededgesfromptoitsnewneighbors.d.Addreciprocal(backward)edgesfromtheneighborsbacktop.IfaddingabackwardedgecausesaneighborsoutdegreetoexceedR,runFilteredRobustPruneonthatneighbor.<strong>StitchedVamana(Algorithm5):</strong>Thisalgorithmbuildstheindexinabatchprocess.<strong>Steps:</strong>1.<strong>BuildPerFilterGraphs:</strong>Foreachfilterlabelfintheuniverseoflabels. This is the key insight: it ensures that if a search for a filter f could have traversed the edge (p, p'), it can now traverse the path (p, p*, ..., p') because the intermediate node p* also has the filter f. * <strong>FilteredVamana (Algorithm 4):</strong> This algorithm builds the filtered graph incrementally. * <strong>Steps:</strong> 1. <strong>Start Point Selection:</strong> For each filter label f, select a representative starting node st(f) using a load-balancing algorithm (`FindMedoid`, Algorithm 2) to avoid overloading any single node. 2. <strong>Incremental Insertion:</strong> Process data points one by one in a random order. For each point p: a. Run `FilteredGreedySearch` for p (using p's own vector and labels as the query) to find a set of candidate neighbors. The search starts from the designated start points for p's labels. b. Use `FilteredRobustPrune` to select the best R neighbors for p from this candidate set. c. Add directed edges from p to its new neighbors. d. Add reciprocal (backward) edges from the neighbors back to p. If adding a backward edge causes a neighbor's out-degree to exceed R, run `FilteredRobustPrune` on that neighbor. * <strong>StitchedVamana (Algorithm 5):</strong> This algorithm builds the index in a batch process. * <strong>Steps:</strong> 1. <strong>Build Per-Filter Graphs:</strong> For each filter label f in the universe of labels \mathcal{F}, build a standard `Vamana` graph G_f using only the subset of points P_f that have that label. These sub-graphs are built with smaller degree (`R_small`) and search list (`L_small`) parameters for speed. 2. <strong>Stitch Graphs:</strong> Create a final graph G by taking the union of all nodes and all edges from all the per-filter graphs (G_f). A node in G will have outgoing edges to all its neighbors from every sub-graph it belongs to. This can result in nodes with very high degrees. 3. <strong>Prune Stitched Graph:</strong> For every node v in the combined graph G, run `FilteredRobustPrune` on its neighborhood to reduce its out-degree to a final, manageable bound `R_stitched`. # 5. Experimental Setup * <strong>Datasets:</strong> The evaluation used a mix of real-world and semi-synthetic datasets to test performance under various conditions. *This is a transcribed version of Table 1 from the paper.* | Dataset | Dim | # Pts. | # Queries | Source Data | Filters | Filters per Pt. | Unique Filters | 100pc. | 75pc. | 50pc. | 25pc. | 1pc. | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | :--- | Turing | 100 | 2,599,968 | 996 | Text | Natural | 1.09 | 3070 | 0.127 | 1.56x10⁻⁴ | 4.15x10⁻⁵ | 1.54x10⁻⁵ | 7.7x10⁻⁶ | Prep | 64 | 1,000,000 | 10000 | Text | Natural | 8.84 | 47 | 0.425 | 0.136 | 0.130 | 0.127 | 0.09 | DANN | 64 | 3,305,317 | 32926 | Text | Natural | 3.91 | 47 | 0.735 | 0.361 | 0.183 | 0.167 | 0.150 | SIFT | 128 | 1,000,000 | 10000 | Image | Random | 1 | 12 | 0.083 | 0.083 | 0.083 | 0.083 | 0.082 | GIST | 960 | 1,000,000 | 1000 | Image | Random | 1 | 12 | 0.083 | 0.083 | 0.083 | 0.083 | 0.082 | msong | 420 | 992,272 | 200 | Audio | Random | 1 | 12 | 0.083 | 0.082 | 0.082 | 0.082 | 0.082 | audio | 192 | 53,387 | 200 | Audio | Random | 1 | 12 | 0.085 | 0.084 | 0.083 | 0.082 | 0.081 | paper | 200 | 2,029,997 | 10000 | Text | Random | 1 | 12 | 0.083 | 0.083 | 0.083 | 0.083 | 0.082 The `Turing`, `Prep`, and `DANN` datasets are real-world industrial datasets with naturally occurring labels, making them strong indicators of practical performance. The other datasets are standard benchmarks where labels were assigned randomly. * <strong>Evaluation Metrics:</strong> 1. <strong>Recall@k:</strong> * <strong>Conceptual Definition:</strong> Measures the accuracy of the search. It is the proportion of the true *k* nearest neighbors (found by a brute-force search) that are present in the top-*k* results returned by the approximate algorithm. A recall of 1.0 (or 100%) means perfect accuracy. For this paper, it is `Recall@10`. * <strong>Mathematical Formula:</strong> \mathrm{Recall}@k = \frac{|A \cap G|}{k}<strong>SymbolExplanation:</strong> * <strong>Symbol Explanation:</strong> * A:ThesetofkneighborsreturnedbytheANNSalgorithm.: The set of *k* neighbors returned by the ANNS algorithm. * G:Thegroundtruthsetoftheactualknearestneighbors.: The ground truth set of the actual *k* nearest neighbors. * k: The number of neighbors to retrieve. 2. <strong>Queries Per Second (QPS):</strong> * <strong>Conceptual Definition:</strong> Measures the speed or throughput of the search index. It is the total number of queries processed divided by the total time taken. Higher QPS is better. The paper reports QPS for runs using 48 threads. * <strong>Baselines:</strong> * `IVF Inline-Processing`: A FAISS IVF index where filtering is done "on the fly" as clusters are explored. * `IVF post-processing`: A standard FAISS IVF index where a large number of results are retrieved and then filtered. * `HNSW post-processing`: A standard FAISS HNSW index with post-processing. * `NHQ`: The competing graph-based filtered ANNS method. * `Milvus`: A popular open-source vector database that supports filtered search. Several of its internal algorithms were tested. # 6. Results & Analysis * <strong>Core Results:</strong> The main results are presented as QPS vs. Recall@10 curves, where the ideal algorithm is in the top-right (high recall and high QPS). ![Figure 1: Turing dataset: QPS \mathbf { \dot { x } }axis)vsrecall@1uer007,50,n1pey.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/1.jpg)该图像是包含五个子图的图表,展示了不同特异性(1<strong>Figure1(TuringDataset):</strong>Thisfiguredramaticallyillustratesthefailureofbaselinemethodsonfilterswithverylowspecificity.Forthe1pcspecificityfilter,FilteredVamanaandStitchedVamanamaintainhighrecallathighQPS,whereasallothermethods(IVF,HNSW)havenearzerorecall.TheirQPSisalsonearly1000xlower.Thisshowsthatwhenthetargetsubsetissmall,onlyafilterawareindexcanperformeffectively.![Figure2:Prepdataset:QPS axis) vs recall `@` 1 u er 007,50,n1 p ey.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/1.jpg) *该图像是包含五个子图的图表,展示了不同特异性(1%、25%、50%、75%、100%)条件下,多种方法的 Recall@10 随查询数变化的趋势比较。* <strong>Figure 1 (Turing Dataset):</strong> This figure dramatically illustrates the failure of baseline methods on filters with very low specificity. For the 1pc specificity filter, `FilteredVamana` and `StitchedVamana` maintain high recall at high QPS, whereas all other methods (`IVF`, `HNSW`) have near-zero recall. Their QPS is also nearly 1000x lower. This shows that when the target subset is small, only a filter-aware index can perform effectively. ![Figure 2: Prep dataset: QPS \mathbf { \dot { x } }i)vsrecall@10forvariousalgorithmswithfilterof100,7,50,25and1percentilspecificity](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/2.jpg)该图像是图表,展示了不同算法在不同特异性滤波条件(100<strong>Figure2(PrepDataset):</strong>Acrossallspecificities,FilteredVamanaandStitchedVamanaformaclearParetofrontier,outperformingallbaselines.At90![Figure3:DANNdataset:QPS i) vs recall@10 for various algorithms with filter of 100, 7, 50, 25 and 1 percentil specificity](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/2.jpg) *该图像是图表,展示了不同算法在不同特异性滤波条件(100%、75%、50%、25%、1%)下,Recall@10与QPS的关系。图中比较了FilteredVamana、StitchedVamana等多种算法的性能差异。* <strong>Figure 2 (Prep Dataset):</strong> Across all specificities, `FilteredVamana` and `StitchedVamana` form a clear Pareto frontier, outperforming all baselines. At 90% recall, `StitchedVamana` is ~6x faster than the next best baseline (`IVF Inline Processing`), and `FilteredVamana` is ~2.5x faster. Post-processing methods perform very poorly. ![Figure 3: DANN dataset: QPS \mathbf { \dot { x } }axis)vsrecall axis) vs recall \bf { \Pi } ( \ @ { 1 0 }forvariousalgorithmswithfltersof100,75,50,25and1percentilespecificity](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/3.jpg)该图像是图表,展现了图3DANN数据集中不同过滤特异性(100<strong>Figure3(DANNDataset):</strong>TheresultsareconsistentwiththePrepdataset.StitchedVamanaandFilteredVamanaaresignificantlybetter.At90<strong>ComparingFilteredVamanaandStitchedVamana:</strong><strong>Performancevs.BuildTime:</strong><strong>QueryPerformance:</strong>StitchedVamanaconsistentlyoffersbetterQPSforagivenrecalllevel(byafactorof 2x).Thisislikelybecauseitbuildsoptimalsubgraphsforeachfilterbeforepruning,preservingstronglocalstructures.<strong>BuildTime:</strong>FilteredVamanaissignificantlyfastertobuild.Thisisbecauseitbuildsthegraphinasinglepass,whereasStitchedVamanamustbuildmanyseparategraphsandthenpruneaverylargecombinedgraph.ThisisatranscribedversionofTable2fromthepaper.Alg./DataDannPrepTuringAudioSIFT::::::FilteredVamana159.866.6103.41.344.1StitchedVamana469.9222.6295.91.624.4NHQNANANA1.124.4MilvusHNSW153.649.3NA5.572.0FaissHNSW158.644.5188.01.171.1(Buildtimesinseconds)<strong>RobustnesstoLabelCorrelation:</strong>![Figure4:ShuffledExperimentforDannandPrep:QPSvsrecall for various algorithms with flters of 100, 75, 50, 25 and 1 percentile specificity](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/3.jpg) *该图像是图表,展现了图3中DANN数据集中不同过滤特异性(100%、75%、50%、25%、1%)下,五种算法的查询每秒数(QPS)与召回率(Recall)@10的对比情况。* <strong>Figure 3 (DANN Dataset):</strong> The results are consistent with the Prep dataset. `StitchedVamana` and `FilteredVamana` are significantly better. At 90% recall, `StitchedVamana` is ~7.5x faster than IVF inline processing, and `FilteredVamana` is ~3x faster. * <strong>Comparing FilteredVamana and StitchedVamana:</strong> * <strong>Performance vs. Build Time:</strong> * <strong>Query Performance:</strong> `StitchedVamana` consistently offers better QPS for a given recall level (by a factor of ~2x). This is likely because it builds optimal sub-graphs for each filter before pruning, preserving strong local structures. * <strong>Build Time:</strong> `FilteredVamana` is significantly faster to build. This is because it builds the graph in a single pass, whereas `StitchedVamana` must build many separate graphs and then prune a very large combined graph. *This is a transcribed version of Table 2 from the paper.* | Alg./Data | Dann | Prep | Turing | Audio | SIFT | :--- | :--- | :--- | :--- | :--- | :--- | FilteredVamana | 159.8 | 66.6 | 103.4 | 1.3 | 44.1 | StitchedVamana | 469.9 | 222.6 | 295.9 | 1.6 | 24.4 | NHQ | NA | NA | NA | 1.1 | 24.4 | Milvus HNSW | 153.6 | 49.3 | NA | 5.5 | 72.0 | Faiss HNSW | 158.6 | 44.5 | 188.0 | 1.1 | 71.1 *(Build times in seconds)* * <strong>Robustness to Label Correlation:</strong> ![Figure 4: Shuffled Experiment for Dann and Prep: QPS vs recall \bf { \Pi } ( \ @ { ' } \mathbf { \Pi } ^ { \omega 1 0 }forFilteredVamanaandStitchedVamanawithfiltersof100and1percentilesp](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/4.jpg)该图像是图表,展示了在prepdann数据集上,针对100<strong>Figure4(ShuffledLabels):</strong>Thisexperimenttestswhathappenswhenthecorrelationbetweenvectorpositionsandlabelsisdestroyedbyshufflingthelabels.FilteredVamanaappearsmorerobust,withitsperformancecurvebarelychanging.StitchedVamanasperformancedegradesslightly,suggestingitmayimplicitlyleveragenaturaldataclusteringthatalignswithlabels.<strong>UnfilteredQueryPerformance:</strong>![Figure5:QPSvsrecall@10forUnfilteredSearchonFilteredVamanaandStitchedVamanabuiltonoriginallabels.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/5.jpg)该图像是两个折线图,展示了FilteredVamanaStitchedVamanaUnfilteredVamana三种方法在dannPrep数据集上的RecallQPS变化的关系。图中横轴为QPS,纵轴为Recall,显示在不同QPS下各算法的性能表现。<strong>Figure5(UnfilteredSearch):</strong>Eventhoughtheyaredesignedforfilteredsearch,bothalgorithmsperformverywellonstandardunfilteredqueries.Theirperformanceisonlyslightlybelowthatofaspecialized,unfilteredVamanaindex(achieving95<strong>StreamingCapability:</strong>ThepaperarguesthatFilteredVamanaisnaturallysuitedfordynamicindiceswherepointsareaddedovertime,asitsincrementalconstructionprocesscanbedirectlyappliedtonewpoints.Incontrast,updatingaStitchedVamanaindexwouldbemorecomplex,asaddingasinglepointcouldrequiremodifyingmultiplesubgraphs.<strong>SSDBasedIndices(FilteredDiskANN):</strong>![Figure6:PerformanceofFilteredDiskANNonalarger28millionpointDANNdatasetonfiltersofvariousspecificity.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/6.jpg)该图像是图表,展示了FilteredDiskANN2800万点DANN数据集上不同特异性过滤条件下的性能表现。左图显示Recall@10随每秒查询数(OPS)的变化,右图展示Recall与每次查询SSDIO次数的关系。<strong>Figure6(SSDPerformance):</strong>Thisdemonstratesthescalabilityoftheapproach.Onalarge28millionpointdatasetstoredonanSSD,thesystemstillachievesthousandsofqueriespersecondwithhighrecall( for FilteredVamana and StitchedVamana with filters of 100 and 1 percentile sp…](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/4.jpg) *该图像是图表,展示了在prep和dann数据集上,针对100%和1%特异性,FilteredVamana与StitchedVamana方法标签随机打乱前后的QPS与Recall@10关系,用以评估标签打乱对性能的影响。* <strong>Figure 4 (Shuffled Labels):</strong> This experiment tests what happens when the correlation between vector positions and labels is destroyed by shuffling the labels. `FilteredVamana` appears more robust, with its performance curve barely changing. `StitchedVamana`'s performance degrades slightly, suggesting it may implicitly leverage natural data clustering that aligns with labels. * <strong>Unfiltered Query Performance:</strong> ![Figure 5: QPS vs recall@10 for Unfiltered Search on FilteredVamana and StitchedVamana built on original labels.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/5.jpg) *该图像是两个折线图,展示了FilteredVamana、StitchedVamana和Unfiltered Vamana三种方法在dann和Prep数据集上的Recall随QPS变化的关系。图中横轴为QPS,纵轴为Recall,显示在不同QPS下各算法的性能表现。* <strong>Figure 5 (Unfiltered Search):</strong> Even though they are designed for filtered search, both algorithms perform very well on standard unfiltered queries. Their performance is only slightly below that of a specialized, unfiltered `Vamana` index (achieving 95% recall at 80-90% of the QPS). This means a single `FilteredVamana` or `StitchedVamana` index can effectively serve both filtered and unfiltered traffic. * <strong>Streaming Capability:</strong> The paper argues that `FilteredVamana` is naturally suited for dynamic indices where points are added over time, as its incremental construction process can be directly applied to new points. In contrast, updating a `StitchedVamana` index would be more complex, as adding a single point could require modifying multiple sub-graphs. * <strong>SSD-Based Indices (`Filtered-DiskANN`):</strong> ![Figure 6: Performance of Filtered-DiskANN on a larger 28 million point DANN dataset on filters of various specificity.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/6.jpg) *该图像是图表,展示了Filtered-DiskANN在2800万点DANN数据集上不同特异性过滤条件下的性能表现。左图显示Recall@10随每秒查询数(OPS)的变化,右图展示Recall与每次查询SSD IO次数的关系。* <strong>Figure 6 (SSD Performance):</strong> This demonstrates the scalability of the approach. On a large 28 million point dataset stored on an SSD, the system still achieves thousands of queries per second with high recall (>90%forhighspecificity, for high specificity, >70%forlowspecificity),whilekeepingthenumberofexpensiveSSDI/Ooperationslow.<strong>OnlineA/BTestinProduction:</strong>Thisisapowerfulvalidationoftherealworldimpact.ThisisatranscribedversionofTable3andTable4fromthepaper.<strong>Table3:OverallPerformanceImprovement</strong>NumberofDifferentRegionsPct.incr.inclicksPct.incr.inrevenue:::4734.61<strong>Table4:SubgroupPerformanceImprovement</strong>RegionsshareinIndexPct.incr.inclicksPct.incr.inrevenue:::3912<1Theresultsshowmassive,statisticallysignificantgainsinclicksandrevenueoverthebaselinepostprocessingsystem.Crucially,Table4confirmsthehypothesis:thebiggestimprovementsareforthesmallestregions(lowestspecificityfilters),whichwerepoorlyservedbytheoldsystem.FilteredVamanacorrectsthisbias,leadingtofairerandmorerelevantadretrieval.<strong>AppendixResults:</strong>TheappendixprovidesfurthercomparisonsagainstNHQKGraph(Figures7and8)andvariousMilvusalgorithms(Figures9,10,and11),consistentlyshowingthatFilteredVamanaandStitchedVamanaoffersuperiorQPS/recalltradeoffsbyalargemargin.![Figure7:KGraphonNHQdatasets:QPS for low specificity), while keeping the number of expensive SSD I/O operations low. * <strong>Online A/B Test in Production:</strong> This is a powerful validation of the real-world impact. *This is a transcribed version of Table 3 and Table 4 from the paper.* <strong>Table 3: Overall Performance Improvement</strong> | Number of Different Regions | Pct. incr. in clicks | Pct. incr. in revenue | :--- | :--- | :--- | 47 | 34.61% (0.03) | 48.95% (0.009) <strong>Table 4: Subgroup Performance Improvement</strong> | Region's share in Index | Pct. incr. in clicks | Pct. incr. in revenue | :--- | :--- | :--- | 3-9% (10 regions) | 25.54% | 28.61% | 1-2% (10 regions) | 54.07% | 46.67% | <1% (27 regions) | 70.67% | 79.77% The results show massive, statistically significant gains in clicks and revenue over the baseline post-processing system. Crucially, Table 4 confirms the hypothesis: the biggest improvements are for the smallest regions (lowest specificity filters), which were poorly served by the old system. `FilteredVamana` corrects this bias, leading to fairer and more relevant ad retrieval. * <strong>Appendix Results:</strong> The appendix provides further comparisons against `NHQ KGraph` (Figures 7 and 8) and various `Milvus` algorithms (Figures 9, 10, and 11), consistently showing that `FilteredVamana` and `StitchedVamana` offer superior QPS/recall trade-offs by a large margin. ![Figure 7: KGraph on NHQ datasets: QPS \mathbf { \dot { x } } - \mathbf { \dot { x } }axis)vsrecall axis) vs recall \bf { \Pi } ( \ @ { 1 0 }forNHQKGraph,FilteredVamanaandStitchedVamana.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/7.jpg)该图像是多子图的图表,展示了在AudioSIFT1MPaperMsongGIST数据集上,FilteredVamanaStitchedVamanaNHQKGraph三种算法的Recall@10与查询每秒(QPS)间的关系,横轴采用对数刻度。![Figure8:KGraphandFilteredVamana:QPS for NHQ KGraph, FilteredVamana and StitchedVamana.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/7.jpg) *该图像是多子图的图表,展示了在Audio、SIFT1M、Paper、Msong和GIST数据集上,FilteredVamana、StitchedVamana和NHQ KGraph三种算法的Recall@10与查询每秒(QPS)间的关系,横轴采用对数刻度。* ![Figure 8: KGraph and Filtered Vamana: QPS \mathbf { \dot { x } }axis)vsrecall@10onNHQdatasets(BuildNormalized).](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/8.jpg)该图像是包含五个子图的图表,展示了FilteredVamanaNHQKGraphAudioSIFT1MPaperMsongGIST数据集上的Recall@10随查询每秒数(QPS)变化的表现。横轴为对数刻度的QPS,纵轴为Recall@10,显示两者性能对比。![该图像是多个折线图组成的图表,展示了不同特异性条件下四种算法(HNSW,IVFFLAT,IVFSQ8,IVFPQ)的Recall@10性能对比。横坐标为查询时间,纵坐标为Recall@10,反映了过滤近似最近邻搜索中各种索引的效率和准确率表现。](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/10.jpg)该图像是多个折线图组成的图表,展示了不同特异性条件下四种算法(HNSW,IVFFLAT,IVFSQ8,IVFPQ)的Recall@10性能对比。横坐标为查询时间,纵坐标为Recall@10,反映了过滤近似最近邻搜索中各种索引的效率和准确率表现。![Figure11:QPS axis) vs recall `@` 10 on NHQ datasets (Build Normalized).](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/8.jpg) *该图像是包含五个子图的图表,展示了Filtered Vamana和NHQ KGraph在Audio、SIFT1M、Paper、Msong和GIST数据集上的Recall@10随查询每秒数(QPS)变化的表现。横轴为对数刻度的QPS,纵轴为Recall@10,显示两者性能对比。* ![该图像是多个折线图组成的图表,展示了不同特异性条件下四种算法(HNSW, IVF FLAT, IVF SQ8, IVF PQ)的Recall@10性能对比。横坐标为查询时间,纵坐标为Recall@10,反映了过滤近似最近邻搜索中各种索引的效率和准确率表现。](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/10.jpg) *该图像是多个折线图组成的图表,展示了不同特异性条件下四种算法(HNSW, IVF FLAT, IVF SQ8, IVF PQ)的Recall@10性能对比。横坐标为查询时间,纵坐标为Recall@10,反映了过滤近似最近邻搜索中各种索引的效率和准确率表现。* ![Figure 11: QPS \mathbf { x }axis)vsrecall axis) vs recall \bf { \Pi } ( \ @ { ' } \mathbf { \Pi } ^ { \omega 1 0 } for Milvus algorithms with 4 NHQ datasets.](/files/papers/68edfb8a58c9cb7bcb2c7e6d/images/11.jpg) *该图像是包含四个子图的图表,展示了四种算法(HNSW、IVF FLAT、IVF SQ8、IVF PQ)在四个数据集(Audio、SIFT1M、Paper、Msong)上的Recall@10与相关参数的关系,体现了不同算法在召回率和速度上的性能表现。* # 7. Conclusion & Reflections * <strong>Conclusion Summary:</strong> The paper successfully demonstrates that integrating filter awareness into the construction of graph-based ANNS indices is a highly effective strategy for filtered ANNS queries. The proposed algorithms, `FilteredVamana` and `StitchedVamana`, establish a new state-of-the-art, offering an order-of-magnitude performance improvement over existing methods. The gains are particularly pronounced for challenging low-specificity filters, and the real-world impact is validated through significant revenue and engagement improvements in a large-scale production system. * <strong>Limitations & Future Work:</strong> The authors identify several areas for future work: * <strong>More Complex Filters:</strong> The current work focuses on single-label matching (equivalent to an OR of labels). Supporting complex SQL-like queries with conjunctions (AND), negations, and range checks remains an open challenge. * <strong>Larger Filter Sets:</strong> Scaling to thousands of filters is demonstrated, but performance with even larger label universes could be explored. * <strong>Dynamic Deletions:</strong> While `FilteredVamana` is suited for insertions, a full evaluation of the dynamic setting including efficient point deletions is left for future work. * <strong>Personal Insights & Critique:</strong> * <strong>Key Insight:</strong> The paper's core strength lies in its simple but powerful idea: graph navigability must be preserved not just geometrically but also per-filter. The `FilteredRobustPrune` condition (F_{p'} \cap F_p \subseteq F_{p^*}$) is an elegant and effective way to encode this principle. It fundamentally shifts the problem from a search-time modification to an index-time optimization, which proves to be the correct approach.
    • Practicality: The work is heavily grounded in practical needs. The choice of building upon Vamana/DiskANN, the consideration of SSDs, the analysis of build time vs. query time trade-offs, and especially the A/B test results, make this paper exceptionally relevant for practitioners in the fields of search, e-commerce, and recommendation systems.
    • Open Questions: The trade-off between FilteredVamana and StitchedVamana is interesting. FilteredVamana seems more general-purpose due to its streaming-friendliness and robustness, while StitchedVamana is a powerful batch solution. It would be interesting to see hybrid approaches that might combine the strengths of both. For instance, one could use StitchedVamana for a large static base and FilteredVamana for an incremental layer on top.

Similar papers

Recommended via semantic vector search.

Discussion

Leave a comment

Sign in to join the discussion.

No comments yet. Start the discussion!