Why Big Data Industrial Systems Need Rules and What We Can Do About It
Summary: Examines handcrafted rules in big-data systems for classification and entity matching, vs academic models. Suggests a research agenda for rule generation, evaluation, execution, and maintenance; calls for scalable rule management with learning and crowdsourcing. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Paul Suganthan G.C.
- 2. Chong Sun
- 3. Krishna Gayatri K.
- 4. Haojun Zhang
- 5. Frank Yang
- 6. Narasimhan Rampalli
- 7. Shishir Prasad
- 8. Esteban Arcaute
- 9. Ganesh Krishnan
- 10. Rohit Deep
- 11. Vijay Raghavendra
- 12. AnHai Doan
Incoming Citations (Sorted by Pagerank)
Showing 3 of 3 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 1,831 | Synthesizing Entity Matching Rules by Examples | 2018 | VLDB | 0.00010384082 |
| 5,192 | Pattern Functional Dependencies for Data Cleaning | 2020 | VLDB | 5.6375087e-05 |
| 7,185 | Certus: An Effective Entity Resolution Approach with Graph Differential Dependencies (GDDs) | 2019 | VLDB | 4.8066159e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 7 of 7 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 287 | Declarative Information Extraction Using Datalog with Embedded Extraction Predicates | 2007 | VLDB | 0.00028971272 |
| 643 | Corleone: Hands-Off Crowdsourcing for Entity Matching | 2014 | SIGMOD | 0.00018754451 |
| 1,716 | Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing | 2014 | VLDB | 0.00010795718 |
| 2,847 | Building, Maintaining, and Using Knowledge Bases: A Report from the Trenches | 2013 | SIGMOD | 8.0224023e-05 |
| 3,532 | Entity Resolution with Evolving Rules | 2010 | VLDB | 7.0020216e-05 |
| 3,989 | Mind the Gap: Large-Scale Frequent Sequence Mining | 2013 | SIGMOD | 6.5583327e-05 |
| 5,431 | Entity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach | 2013 | VLDB | 5.5076946e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,456 | From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems | 2019 | SIGMOD | 5.0564619e-05 |
| 13,872 | Expressing Business Rules | 2000 | SIGMOD | - |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
| 8,823 | The Role of Schema Matching in Large Enterprises | 2009 | CIDR | 4.4415658e-05 |
| 11,949 | Big Data Research: Will Industry Solve all the Problems? | 2015 | VLDB | 4.1945683e-05 |
| 1,716 | Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing | 2014 | VLDB | 0.00010795718 |
| 732 | Discovering Data Quality Rules | 2008 | VLDB | 0.00017465093 |
| 1,339 | Implementing Large Production Systems in a DBMS Environment: Concepts and Algorithms | 1988 | SIGMOD | 0.00012492597 |
| 9,963 | Parallel Rule Discovery from Large Datasets by Sampling | 2022 | SIGMOD | 4.2294678e-05 |
| 7,800 | Data Management for Large Rule Systems | 1991 | VLDB | 4.6474123e-05 |