Database Paper Browser

Back to papers

A Theoretical Framework for Distribution-Aware Dataset Search

Summary: Distribution-aware dataset search via percentile (Ptile) and preference (Pref) indexing for centralized and federated settings. Presents lower bounds against near-linear-space in the centralized case and approximate O~(N)-space structures with O~(N) preprocessing and O~(1+OUT) queries, epsilon+2delta accuracy. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
1961
Venue
PODS
Year
2025
Pagerank
4.1945683e-05
Overall Rank
10,341 | 28.06%
DOI
10.1145/3725227

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

Incoming Citations (Sorted by Pagerank)

Showing 0 of 0 citing papers.

Rank Citing Paper Year Venue Pagerank
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 16 of 16 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
2,324 RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search 2024 SIGMOD 9.0326444e-05
2,752 Composable Core-sets for Diversity and Coverage Maximization 2014 PODS 8.1742326e-05
2,976 Processing a Large Number of Continuous Preference Top-k Queries 2012 SIGMOD 7.789303e-05
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,794 Discovering Related Data At Scale 2021 VLDB 5.3245122e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
6,467 Tailoring Data Source Distributions for Fairness-aware Data Integration 2021 VLDB 5.0528156e-05
7,761 Space-Time Tradeoffs for Conjunctive Queries with Access Patterns 2023 PODS 4.658708e-05
7,851 Consistent Range Approximation for Fair Predictive Modeling 2023 VLDB 4.6353072e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
9,322 Indexing for Keyword Search with Structured Constraints 2023 PODS 4.3556432e-05
9,928 Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search 2024 VLDB 4.2511622e-05
Previous Page 1 / 1 Next

Semantically Similar Papers