DICE: Data Discovery by Example
Summary: Data discovery across heterogeneous sources by example in data lakes. DICE synthesizes candidate join-path queries, then iteratively refines them with user validation, enabling schema-free, interactive discovery of relevant data for analysts. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. El Kindi Rezig
- 2. Anshul Bhandari
- 3. Anna Fariha
- 4. Benjamin Price
- 5. Allan Vanterpool
- 6. Vijay Gadepally
- 7. Michael Stonebraker
Incoming Citations (Sorted by Pagerank)
Showing 7 of 7 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 9,035 | Data-Driven Insight Synthesis for Multi-Dimensional Data | 2024 | VLDB | 4.4039656e-05 |
| 9,826 | Exploiting Structure in Regular Expression Queries | 2023 | SIGMOD | 4.2751057e-05 |
| 9,928 | Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search | 2024 | VLDB | 4.2511622e-05 |
| 10,197 | Qualitative Join Discovery in Data Lakes using Examples | 2026 | SIGMOD | 4.1945683e-05 |
| 10,829 | Sort it Like You Mean It: Discovering Semantically Interesting Attribute Augmentations to Sort Tables | 2025 | VLDB | 4.1945683e-05 |
| 11,379 | Fast Dataset Search with Earth Mover’s Distance | 2022 | VLDB | 4.1945683e-05 |
| 13,201 | Examples are All You Need: Iterative Data Discovery by Example in Data Lakes | 2022 | CIDR | - |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,277 | The Data Civilizer System | 2017 | CIDR | 0.00012879695 |
| 1,430 | Duoquest: A Dual-Specification System for Expressive SQL Queries | 2020 | SIGMOD | 0.00012031061 |
| 1,459 | Query From Examples: An Iterative, Data-Driven Approach to Query Construction | 2015 | VLDB | 0.00011889802 |
| 1,509 | Discovering Queries based on Example Tuples | 2014 | SIGMOD | 0.00011612727 |
| 2,576 | S4: Top-k Spreadsheet-Style Search for Query Discovery | 2015 | SIGMOD | 8.5112408e-05 |
| 3,661 | Example-Driven Query Intent Discovery: Abductive Reasoning using Semantic Similarity | 2019 | VLDB | 6.8689912e-05 |
| 5,684 | Dagger: A Data (not code) Debugger | 2020 | CIDR | 5.3720749e-05 |
Previous
Page 1 / 1
Next