Database Paper Browser

Back to papers

SMOKE: Fine-grained Lineage at Interactive Speed

Summary: SMOKE is an in-memory DB engine that tightly integrates lineage capture into physical operators to minimize overhead and accelerate lineage queries. It uses compact lineage representations and upfront-query-aware optimizations to deliver interactive latency (sub-150 ms) and multi-order-of-magnitude improvements over prior systems on real workloads. (summarized by gpt-5-nano on Feb 09 2026)

Paper ID
11778
Venue
VLDB
Year
2018
Pagerank
9.1111033e-05
Overall Rank
2,280 | 84.14%
DOI
10.14778/3184470.3184475

Incoming Non-self Citations Over Time

Authors

Incoming Citations (Sorted by Pagerank)

Showing 25 of 25 citing papers.

Rank Citing Paper Year Venue Pagerank
1,350 Northstar: An Interactive Data Science System 2018 VLDB 0.00012431059
2,533 DeepLens: Towards a Visual Data Management System 2019 CIDR 8.5899934e-05
3,149 Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems 2019 VLDB 7.4741595e-05
4,774 LIMA: Fine-grained Lineage Tracing and Reuse in Machine Learning Systems 2021 SIGMOD 5.9316087e-05
5,691 Putting Things into Context: Rich Explanations for Query Answers using Join Graphs 2021 SIGMOD 5.3684557e-05
5,733 Explaining Wrong Queries Using Small Examples 2019 SIGMOD 5.3483446e-05
5,810 Database Benchmarking for Supporting Real-Time Interactive Querying of Large Data 2020 SIGMOD 5.3178017e-05
6,409 Fine-Grained Lineage for Safer Notebook Interactions 2021 VLDB 5.0756653e-05
6,842 Towards Democratizing Relational Data Visualization 2019 SIGMOD 4.9103931e-05
7,556 Interactive Query Explanations Using Fine Grained Provenance 2022 SIGMOD 4.7117814e-05
8,163 Capturing and Querying Fine-grained Provenance of Preprocessing Pipelines in Data Science 2021 VLDB 4.5723431e-05
8,271 Rumble: Data Independence for Large Messy Data Sets 2021 VLDB 4.5453618e-05
8,729 OneProvenance: Efficient Extraction of Dynamic Coarse-Grained Provenance From Database Query Event Logs 2023 VLDB 4.4582221e-05
8,886 Provenance-based Data Skipping 2022 VLDB 4.4279829e-05
9,202 Compact, Tamper-Resistant Archival of Fine-Grained Provenance 2021 VLDB 4.3742967e-05
9,600 Optimizing Dataflow Systems for Scalable Interactive Visualization 2024 SIGMOD 4.3177432e-05
9,706 Distributed Numerical and Machine Learning Computations via Two-Phase Execution of Aggregated Join Trees 2021 VLDB 4.2992942e-05
9,921 ProvCite: Provenance-based Data Citation 2019 VLDB 4.2549509e-05
10,024 LPStream: Fine-grained Lazy Provenance for Stream Processing 2026 SIGMOD 4.1945683e-05
10,377 FastPDB: Towards Bag-Probabilistic Queries at Interactive Speeds 2025 SIGMOD 4.1945683e-05
10,419 Unified Lineage System: Tracking Data Provenance at Scale 2025 SIGMOD 4.1945683e-05
10,886 FaDE: More Than a Million What-ifs Per Second 2025 VLDB 4.1945683e-05
11,452 Flow Provenance in Temporal Interaction Networks 2021 SIGMOD 4.1945683e-05
11,518 A Demonstration of RELIC: A System for REtrospective Lineage InferenCe of Data Workflows 2021 VLDB 4.1945683e-05
11,710 Demonstration of Smoke: A Deep Breath of Data-Intensive Lineage Applications 2018 SIGMOD 4.1945683e-05
Previous Page 1 / 1 Next

Outgoing Citations (Sorted by Pagerank)

Showing 28 of 28 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank Cited Paper Year Venue Pagerank
60 Efficiently Compiling Efficient Query Plans for Modern Hardware 2011 VLDB 0.00064439773
192 HoloClean: Holistic Data Repairs with Probabilistic Inference 2017 VLDB 0.00035728858
214 Scorpion: Explaining Away Outliers in Aggregate Queries 2013 VLDB 0.0003363692
299 Trio: A System for Data, Uncertainty, and Lineage 2006 VLDB 0.00028525071
561 An Annotation Management System for Relational Databases 2004 VLDB 0.00020115419
1,440 Provenance for Generalized Map and Reduce Workflows 2011 CIDR 0.00011961469
1,532 Data Management in Machine Learning: Challenges, Techniques, and Systems 2017 SIGMOD 0.00011472681
1,618 Row-wise Parallel Predicate Evaluation 2008 VLDB 0.00011114015
1,625 Data Profiling with Metanome 2015 VLDB 0.00011094926
1,646 Caravan: Provisioning for What-If Analysis 2013 CIDR 0.00011036992
1,805 M4: A Visualization-Oriented Time Series Data Aggregation 2014 VLDB 0.00010493299
1,824 DBNotes: A Post-It System for Relational Databases based on Provenance 2005 SIGMOD 0.00010405194
2,027 Titian: Data Provenance Support in Spark 2016 VLDB 9.7437067e-05
2,173 Querying Data Provenance 2010 SIGMOD 9.3676609e-05
2,311 On Improving User Response Times in Tableau 2015 SIGMOD 9.0539767e-05
2,649 Explaining Query Answers with Explanation-Ready Databases 2016 VLDB 8.3719123e-05
2,764 The Semiring Framework for Database Provenance 2017 PODS 8.1574444e-05
2,892 Data Provenance at Internet Scale: Architecture, Experiences, and the Road Ahead 2017 CIDR 7.9480559e-05
3,976 UGuide – User-Guided Discovery of FD-Detectable Errors 2017 SIGMOD 6.5736462e-05
4,161 Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe? 2017 SIGMOD 6.3938006e-05
4,851 Provenance for Natural Language Queries 2017 VLDB 5.8768322e-05
5,209 Explaining Outputs in Modern Data Analytics 2016 VLDB 5.629362e-05
5,660 Descriptive and Prescriptive Data Cleaning 2014 SIGMOD 5.3847321e-05
5,867 Combining Design and Performance in a Data Visualization Management System 2017 CIDR 5.296418e-05
6,072 Factorizing Complex Predicates in Queries to Exploit Indexes 2003 SIGMOD 5.2257599e-05
6,384 A Demonstration of DBWipes: Clean as You Query 2012 VLDB 5.0880333e-05
6,777 Revisiting Reuse in Main Memory Database Systems 2017 SIGMOD 4.9288776e-05
8,593 Wisteria: Nurturing Scalable Data Cleaning Infrastructure 2015 VLDB 4.4891474e-05
Previous Page 1 / 1 Next

Semantically Similar Papers