Hugging Face Daily Papers · · 4 min read

Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions

Mirrored from Hugging Face Daily Papers for archival readability. Support the source by reading on the original site.

Grid-based approaches to approximate nearest neighbor (ANN) search have been absent from modern scaling analyses. We present a systematic characterization of a multiprobe grid algorithm with respect to dataset size $N$ and dimensionality $d$. Our experiments reveal a previously unreported $d$-scaling<br>crossover on the GloVe embedding family, in which multiprobe grid search maintains an approximately constant dimensional scaling exponent while other graph-, tree-, and partitioning-based methods exhibit degrading throughput. The advantage comes with near-linear query scaling in $N$, but also with lower<br>indexing cost than competing ANN methods. Our results suggest that grid-based methods such as multiprobe grid may be competitive in rebuild-heavy or high-dimensional settings where indexing cost and dimensional robustness dictate performance. More broadly, recent work has formalized self-attention as an ANN operation. Thus, the $N$- and $d$-scaling properties of ANN algorithms may guide cost analysis of efficient transformer architectures. Code is available at: <a href=\"https://github.com/weiz345/MultiProbeANN\" rel=\"nofollow\">https://github.com/weiz345/MultiProbeANN</a></p>\n","updatedAt":"2026-07-03T19:14:21.542Z","author":{"_id":"6747a1f6f8cab58c22d81529","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/2E09TB1t6uSKaAv5ihERy.png","fullname":"Noah Flynn","name":"noaflynn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9008229970932007},"editors":["noaflynn"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/2E09TB1t6uSKaAv5ihERy.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2607.01283","authors":[{"_id":"6a48098a3daa34221c7c1eb4","name":"Matthew J Liu","hidden":false},{"_id":"6a48098a3daa34221c7c1eb5","name":"Wei Hang Zheng","hidden":false},{"_id":"6a48098a3daa34221c7c1eb6","name":"Vidhan Purohit","hidden":false},{"_id":"6a48098a3daa34221c7c1eb7","name":"Siqi Xie","hidden":false},{"_id":"6a48098a3daa34221c7c1eb8","name":"Chieh-En Li","hidden":false},{"_id":"6a48098a3daa34221c7c1eb9","name":"Jerry Li","hidden":false},{"_id":"6a48098a3daa34221c7c1eba","name":"Noah Flynn","hidden":false}],"publishedAt":"2026-07-01T00:00:00.000Z","submittedOnDailyAt":"2026-07-03T00:00:00.000Z","title":"Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions","submittedOnDailyBy":{"_id":"6747a1f6f8cab58c22d81529","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/2E09TB1t6uSKaAv5ihERy.png","isPro":false,"fullname":"Noah Flynn","user":"noaflynn","type":"user","name":"noaflynn"},"summary":"Grid-based approaches to approximate nearest neighbor (ANN) search have been absent from modern scaling analyses. We present a systematic characterization of a multiprobe grid algorithm with respect to dataset size N and dimensionality d. Our experiments reveal a previously unreported d-scaling crossover on the GloVe embedding family, in which multiprobe grid search maintains an approximately constant dimensional scaling exponent while other graph-, tree-, and partitioning-based methods exhibit degrading throughput. The advantage comes with near-linear query scaling in N, but also with lower indexing cost than competing ANN methods. Our results suggest that grid-based methods such as multiprobe grid may be competitive in rebuild-heavy or high-dimensional settings where indexing cost and dimensional robustness dictate performance. More broadly, recent work has formalized self-attention as an ANN operation. Thus, the N- and d-scaling properties of ANN algorithms may guide cost analysis of efficient transformer architectures. Code is available at: https://github.com/weiz345/MultiProbeANN.","upvotes":0,"discussionId":"6a48098a3daa34221c7c1ebb","githubRepo":"https://github.com/weiz345/MultiProbeANN","githubRepoAddedBy":"user","ai_summary":"Grid-based multiprobe algorithms demonstrate superior dimensional scaling properties compared to graph-, tree-, and partitioning-based methods for approximate nearest neighbor search, making them competitive for high-dimensional and rebuild-heavy applications.","ai_keywords":["approximate nearest neighbor search","multiprobe grid algorithm","dataset size","dimensionality","GloVe embedding","indexing cost","self-attention","transformer architectures"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","githubStars":0,"organization":{"_id":"66b1baeff10262fc4fa61961","name":"UCBerkeley","fullname":"University of California, Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f425c3a096536aeab42dea/bxNKEkprdm5JI1wkjmNAL.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["en"],"organization":{"_id":"66b1baeff10262fc4fa61961","name":"UCBerkeley","fullname":"University of California, Berkeley","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/63f425c3a096536aeab42dea/bxNKEkprdm5JI1wkjmNAL.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2607/2607.01283.md","query":{}}">
Papers
arxiv:2607.01283

Scaling Laws for Grid-Based Approximate Nearest Neighbor Search in High Dimensions

Published on Jul 1
· Submitted by
Noah Flynn
on Jul 3
Authors:
,
,
,
,
,
,

Abstract

Grid-based multiprobe algorithms demonstrate superior dimensional scaling properties compared to graph-, tree-, and partitioning-based methods for approximate nearest neighbor search, making them competitive for high-dimensional and rebuild-heavy applications.

Grid-based approaches to approximate nearest neighbor (ANN) search have been absent from modern scaling analyses. We present a systematic characterization of a multiprobe grid algorithm with respect to dataset size N and dimensionality d. Our experiments reveal a previously unreported d-scaling crossover on the GloVe embedding family, in which multiprobe grid search maintains an approximately constant dimensional scaling exponent while other graph-, tree-, and partitioning-based methods exhibit degrading throughput. The advantage comes with near-linear query scaling in N, but also with lower indexing cost than competing ANN methods. Our results suggest that grid-based methods such as multiprobe grid may be competitive in rebuild-heavy or high-dimensional settings where indexing cost and dimensional robustness dictate performance. More broadly, recent work has formalized self-attention as an ANN operation. Thus, the N- and d-scaling properties of ANN algorithms may guide cost analysis of efficient transformer architectures. Code is available at: https://github.com/weiz345/MultiProbeANN.

Community

Paper submitter about 6 hours ago

Grid-based approaches to approximate nearest neighbor (ANN) search have been absent from modern scaling analyses. We present a systematic characterization of a multiprobe grid algorithm with respect to dataset size $N$ and dimensionality $d$. Our experiments reveal a previously unreported $d$-scaling
crossover on the GloVe embedding family, in which multiprobe grid search maintains an approximately constant dimensional scaling exponent while other graph-, tree-, and partitioning-based methods exhibit degrading throughput. The advantage comes with near-linear query scaling in $N$, but also with lower
indexing cost than competing ANN methods. Our results suggest that grid-based methods such as multiprobe grid may be competitive in rebuild-heavy or high-dimensional settings where indexing cost and dimensional robustness dictate performance. More broadly, recent work has formalized self-attention as an ANN operation. Thus, the $N$- and $d$-scaling properties of ANN algorithms may guide cost analysis of efficient transformer architectures. Code is available at: https://github.com/weiz345/MultiProbeANN

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images

· Sign up or log in to comment

Get this paper in your agent:

hf papers read 2607.01283
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2607.01283 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2607.01283 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2607.01283 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from Hugging Face Daily Papers