Back to papers
A Theoretical Framework for Distribution-Aware Dataset Search
Summary: Distribution-aware dataset search via percentile (Ptile) and preference (Pref) indexing for centralized and federated settings. Presents lower bounds against near-linear-space in the centralized case and approximate O~(N)-space structures with O~(N) preprocessing and O~(1+OUT) queries, epsilon+2delta accuracy.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 1961
- Venue
- PODS
- Year
- 2025
- Pagerank
- 4.1945683e-05
- Overall Rank
- 10,341 | 28.06%
- DOI
-
10.1145/3725227
Incoming Non-self Citations Over Time
No non-self incoming citations found for this paper in this database.
Incoming Citations (Sorted by Pagerank)
Showing 0 of 0 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 1,644 |
Finding Related Tables in Data Lakes for Interactive Data Science |
2020 |
SIGMOD |
0.00011041787 |
| 1,751 |
Auctus: A Dataset Search Engine for Data Discovery and Augmentation |
2021 |
VLDB |
0.00010683295 |
| 2,324 |
RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search |
2024 |
SIGMOD |
9.0326444e-05 |
| 2,752 |
Composable Core-sets for Diversity and Coverage Maximization |
2014 |
PODS |
8.1742326e-05 |
| 2,976 |
Processing a Large Number of Continuous Preference Top-k Queries |
2012 |
SIGMOD |
7.789303e-05 |
| 3,358 |
Organizing Data Lakes for Navigation |
2020 |
SIGMOD |
7.1784949e-05 |
| 5,024 |
Towards Distribution-aware Query Answering in Data Markets |
2022 |
VLDB |
5.7535043e-05 |
| 5,794 |
Discovering Related Data At Scale |
2021 |
VLDB |
5.3245122e-05 |
| 6,270 |
MATE: Multi-Attribute Table Extraction |
2022 |
VLDB |
5.1337451e-05 |
| 6,438 |
RONIN: Data Lake Exploration |
2021 |
VLDB |
5.0620163e-05 |
| 6,467 |
Tailoring Data Source Distributions for Fairness-aware Data Integration |
2021 |
VLDB |
5.0528156e-05 |
| 7,761 |
Space-Time Tradeoffs for Conjunctive Queries with Access Patterns |
2023 |
PODS |
4.658708e-05 |
| 7,851 |
Consistent Range Approximation for Fair Predictive Modeling |
2023 |
VLDB |
4.6353072e-05 |
| 8,618 |
Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data |
2024 |
SIGMOD |
4.4838259e-05 |
| 9,322 |
Indexing for Keyword Search with Structured Constraints |
2023 |
PODS |
4.3556432e-05 |
| 9,928 |
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search |
2024 |
VLDB |
4.2511622e-05 |
Semantically Similar Papers