CAFE: Constraint-Aware Feature Extraction from Large Databases

Summary: CAFE extracts features from large DBs while enforcing high-level constraints (consistency, interpretability, fairness) by mapping them to low-level pruning strategies and using an inverted index to find candidate columns. An optimizer-like planner uses sample-based estimates, models strategy dependencies, and orders pruning to maximize downstream ML accuracy while bounding runtime and feature quality. (summarized by gpt-5-mini on Feb 09 2026)

Paper ID: 378
Venue: CIDR
Year: 2020
Pagerank: 4.1905499e-05
Overall Rank: 11,551 | 19.72%
DOI: -

Incoming Non-self Citations Over Time

No non-self incoming citations found for this paper in this database.

Authors

1. Mahdi Esmailoghli
2. Ziawasch Abedjan

Incoming Citations (Sorted by Pagerank)

Showing 4 of 4 citing papers.

Rank	Citing Paper	Year	Venue	Pagerank
4,761	Automated Feature Engineering for Algorithmic Fairness	2021	VLDB	5.9341687e-05
6,268	MATE: Multi-Attribute Table Extraction	2022	VLDB	5.1288179e-05
10,840	Data Discovery in Data Lakes: Operations, Indexes, Systems	2025	VLDB	4.1905499e-05
11,480	Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study	2021	SIGMOD	4.1905499e-05

Outgoing Citations (Sorted by Pagerank)

Showing 2 of 2 cited papers.

Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.

Rank	Cited Paper	Year	Venue	Pagerank
420	InfoGather: Entity Augmentation and Attribute Discovery By Holistic Matching with Web Tables	2012	SIGMOD	0.00023700634
1,274	The Data Civilizer System	2017	CIDR	0.00012869297

Semantically Similar Papers

Overall Rank	Paper	Year	Venue	Pagerank
332	Accelerating Machine Learning Inference with Probabilistic Predicates	2018	SIGMOD	0.00027173479
4,805	Optimization of Constrained Frequent Set Queries with 2-variable Constraints	1999	SIGMOD	5.9050441e-05
5,062	Optimizing Machine Learning Inference Queries with Correlative Proxy Models	2022	VLDB	5.7172262e-05
6,345	A Relational Framework for Classifier Engineering	2017	PODS	5.0970607e-05
9,311	On Efficient Approximate Queries over Machine Learning Models	2023	VLDB	4.3535588e-05
1,609	Exploratory Mining and Pruning Optimizations of Constrained Association Rules	1998	SIGMOD	0.00011166163
11,480	Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study	2021	SIGMOD	4.1905499e-05
804	An End-to-End Learning-based Cost Estimator	2020	VLDB	0.0001643674
7,180	Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning	2023	VLDB	4.8032775e-05
9,408	CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models	2024	SIGMOD	4.3399748e-05