Database Paper Browser

Back to authors

Lei Cao

Author ID
1605
ORCID
0000-0001-9909-8607
Links
(found by gpt-5.2 on feb 8th, 2026)
Most Frequent Institution
Massachusetts Institute of Technology
Pagerank
0.30898998
Overall Rank
146 | 99.32%
Paper Count
44

Affiliation Timeline

Incoming Non-self Citations Over Time

Total yearly non-self incoming citations across all papers by this author.

Publications by Paper Pagerank

Showing 44 of 44 publications.

Rank Title Year Venue Pagerank
1,541 Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes 2023 CIDR 0.00011456579
2,106 Palimpzest: Optimizing AI-Powered Analytics with Declarative Query Processing 2025 CIDR 9.5342543e-05
2,369 Aria: A Fast and Practical Deterministic OLTP Database 2020 VLDB 8.9490403e-05
2,825 Smile: A System to Support Machine Learning on EEG Data at Scale 2019 VLDB 8.0563426e-05
2,945 Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning 2023 SIGMOD 7.8377395e-05
3,171 Interactive Outlier Exploration in Big Data Streams 2014 VLDB 7.4447236e-05
3,473 AI Meets Database: AI4DB and DB4AI 2021 SIGMOD 7.062864e-05
4,456 AutoOD: Automatic Outlier Detection 2023 SIGMOD 6.1704203e-05
4,554 A Demonstration of AutoOD: A Self-Tuning Anomaly Detection System 2022 VLDB 6.0911296e-05
4,908 Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL 2024 VLDB 5.8339245e-05
5,684 Dagger: A Data (not code) Debugger 2020 CIDR 5.3720749e-05
5,768 Epoch-based Commit and Replication in Distributed OLTP Databases 2021 VLDB 5.3333911e-05
5,861 Machine Learning for Databases 2021 VLDB 5.298883e-05
6,107 Continuously Adaptive Similarity Search 2020 SIGMOD 5.2066612e-05
6,394 Pluto: Sample Selection for Robust Anomaly Detection on Polluted Log Data 2024 SIGMOD 5.0829207e-05
6,877 Extract-Transform-Load for Video Streams 2023 VLDB 4.8974054e-05
7,575 Human-in-the-loop Outlier Detection 2020 SIGMOD 4.7068909e-05
7,582 LakeCompass: An End-to-End System for Data Maintenance, Search and Analysis in Data Lakes 2024 VLDB 4.7046388e-05
8,000 Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics 2019 VLDB 4.6092803e-05
8,116 LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes 2024 VLDB 4.581507e-05
8,117 Efficient Discovery of Sequence Outlier Patterns 2019 VLDB 4.5814937e-05
8,714 LANCET: Labeling Complex Data at Scale 2021 VLDB 4.4619818e-05
8,979 High Performance Stream Query Processing With Correlation-Aware Partitioning 2014 VLDB 4.4170433e-05
9,077 VerifAI: Verified Generative AI 2024 CIDR 4.4010762e-05
9,152 Doctopus: Budget-aware Structural Table Extraction from Unstructured Documents 2025 VLDB 4.3849295e-05
9,475 OIE: An Interpretable System for Outlier Explanation and Summarization 2025 SIGMOD 4.3341665e-05
9,492 Lingua Manga : A Generic Large Language Model Centric System for Data Curation 2023 VLDB 4.3341665e-05
9,617 Complex Event Analytics: Online Aggregation of Stream Sequence Patterns 2014 SIGMOD 4.3176634e-05
9,709 Outlier Summarization via Human Interpretable Rules 2024 VLDB 4.299267e-05
9,879 HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search 2026 SIGMOD 4.2643674e-05
10,208 Scalable Clustering Over High Dimensional Vector Streams 2026 SIGMOD 4.1945683e-05
10,239 BRIEF: Bi-level Coreset Selection for Efficient Instruction Tuning in LLMs 2026 VLDB 4.1945683e-05
10,325 KEN: An Execution Engine for Unstructured Database Systems 2026 VLDB 4.1945683e-05
10,365 Agree to Disagree: Robust Anomaly Detection with Noisy Labels 2025 SIGMOD 4.1945683e-05
10,438 Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents 2025 SIGMOD 4.1945683e-05
10,528 Two Birds with One Stone: Efficient Deep Learning over Mislabeled Data through Subset Selection 2025 SIGMOD 4.1945683e-05
10,682 AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework 2025 VLDB 4.1945683e-05
10,752 QUEST: Query Optimization in Unstructured Document Analysis 2025 VLDB 4.1945683e-05
10,952 RITA: Group Attention is All You Need for Timeseries Analytics 2024 SIGMOD 4.1945683e-05
11,000 MisDetect: Iterative Mislabel Detection using Early Loss 2024 VLDB 4.1945683e-05
11,008 MetaStore: Analyzing Deep Learning Meta-Data at Scale 2024 VLDB 4.1945683e-05
11,514 ATLANTIC: Making Database Differentially Private and Faster with Accuracy Guarantee 2021 VLDB 4.1945683e-05
11,692 SWIFT: Mining Representative Patterns from Large Event Streams 2019 VLDB 4.1945683e-05
13,134 DocDB: A Database for Unstructured Document Analysis 2025 VLDB -
Previous Page 1 / 1 Next

Frequent Co-authors

Co-authored at least 5 papers.

Co-author Shared Papers Rank Pagerank
Samuel R. Madden 23 1 1.3916842
Elke Rundensteiner 12 73 0.44002142
Chengliang Chai 11 211 0.24025524
Nan Tang 10 47 0.55638652
Guoren Wang 9 155 0.29864489
Ye Yuan 9 327 0.16962238
Yuhao Deng 9 1,323 0.052819564
Yizhou Yan 8 1,179 0.058129854
Guoliang Li 7 8 0.98178505
Ju Fan 6 158 0.29624962
Huayi Zhang 5 1,728 0.041893051
Yuping Wang 5 1,858 0.039791044