Database Paper Browser

Back to papers

Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search

Summary: Fainder introduces a distribution-aware index for percentile predicates over heterogeneous histogram summaries, enabling dataset discovery based on distributional properties rather than keywords. It uses binary search plus multi-step pruning on summary bounds to prune candidates and yields order-of-magnitude speedups. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID
13540
Venue
VLDB
Year
2024
Pagerank
4.2511622e-05
Overall Rank
9,928 | 30.94%
DOI
10.14778/3681954.3681999

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 18 of 18 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
326 Optimal Histograms with Quality Guarantees 1998 VLDB 0.00027358981
610 Goods: Organizing Google's Datasets 2016 SIGMOD 0.00019232674
1,612 Detecting Data Errors: Where are we and what needs to be done? 2016 VLDB 0.00011142794
1,644 Finding Related Tables in Data Lakes for Interactive Data Science 2020 SIGMOD 0.00011041787
1,751 Auctus: A Dataset Search Engine for Data Discovery and Augmentation 2021 VLDB 0.00010683295
3,358 Organizing Data Lakes for Navigation 2020 SIGMOD 7.1784949e-05
3,520 GitTables: A Large-Scale Corpus of Relational Tables 2023 SIGMOD 7.0131061e-05
5,024 Towards Distribution-aware Query Answering in Data Markets 2022 VLDB 5.7535043e-05
5,381 Selective Data Acquisition in the Wild for Model Charging 2022 VLDB 5.5399508e-05
5,794 Discovering Related Data At Scale 2021 VLDB 5.3245122e-05
6,270 MATE: Multi-Attribute Table Extraction 2022 VLDB 5.1337451e-05
6,438 RONIN: Data Lake Exploration 2021 VLDB 5.0620163e-05
6,467 Tailoring Data Source Distributions for Fairness-aware Data Integration 2021 VLDB 5.0528156e-05
6,944 DataPrism: Exposing Disconnect between Data and Systems 2022 SIGMOD 4.8912787e-05
7,303 DICE: Data Discovery by Example 2021 VLDB 4.7684686e-05
7,851 Consistent Range Approximation for Fair Predictive Modeling 2023 VLDB 4.6353072e-05
7,868 Solo: Data Discovery Using Natural Language Questions Via A Self-Supervised Approach 2023 SIGMOD 4.6319504e-05
8,618 Nexus: Correlation Discovery over Collections of Spatio-Temporal Tabular Data 2024 SIGMOD 4.4838259e-05
Previous Page 1 / 1 Next

Semantically Similar Papers