Back to papers
Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction
Summary: Faerie provides a unified framework for approximate dictionary-based entity extraction, supporting diverse similarity/dissimilarity measures. It uses overlap-aware filtering and pruning to share work across substrings, achieving top performance.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 4410
- Venue
- SIGMOD
- Year
- 2011
- Pagerank
- 5.7177424e-05
- Overall Rank
- 5,073 | 64.71%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 11 of 11 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 1,396 |
Can We Beat the Prefix Filtering? An Adaptive Framework for Similarity Join and Search |
2012 |
SIGMOD |
0.00012204748 |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 4,776 |
Exploiting Evidence from Unstructured Data to Enhance Master Data Management |
2012 |
VLDB |
5.9314064e-05 |
| 5,179 |
SilkMoth: An Efficient Method for Finding Related Sets with Maximum Matching Constraints |
2017 |
VLDB |
5.6428428e-05 |
| 7,474 |
Cardinality Estimation of Approximate Substring Queries using Deep Learning |
2022 |
VLDB |
4.7194345e-05 |
| 7,635 |
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts |
2021 |
SIGMOD |
4.6908858e-05 |
| 7,700 |
Near-Duplicate Text Alignment with One Permutation Hashing |
2024 |
SIGMOD |
4.6744372e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
| 8,291 |
TxtAlign: Efficient Near-Duplicate Text Alignment Search via Bottom-k Sketches for Plagiarism Detection |
2022 |
SIGMOD |
4.5435639e-05 |
| 9,876 |
Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation |
2023 |
SIGMOD |
4.2667743e-05 |
| 10,216 |
The Case For Language Model Approximated LIKE Predicate |
2026 |
SIGMOD |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 16 of 16 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 125 |
Approximate String Joins in a Database (Almost) for Free |
2001 |
VLDB |
0.00044847972 |
| 155 |
Robust and Efficient Fuzzy Match for Online Data Cleaning |
2003 |
SIGMOD |
0.00040637896 |
| 250 |
Efficient set joins on similarity predicates |
2004 |
SIGMOD |
0.00030661988 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 1,202 |
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams |
2007 |
VLDB |
0.00013326298 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 1,830 |
Relaxing Join and Selection Queries |
2006 |
VLDB |
0.000103862 |
| 2,213 |
n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure |
2005 |
VLDB |
9.2765152e-05 |
| 2,779 |
Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries |
2008 |
VLDB |
8.1320575e-05 |
| 3,226 |
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance |
2007 |
VLDB |
7.3433307e-05 |
| 3,578 |
Efficient Approximate Entity Extraction with Edit Distance Constraints |
2009 |
SIGMOD |
6.9503858e-05 |
| 3,868 |
An Efficient Filter for Approximate Membership Checking |
2008 |
SIGMOD |
6.6822543e-05 |
| 4,873 |
Power-Law Based Estimation of Set Similarity Join Size |
2009 |
VLDB |
5.8602304e-05 |
| 4,951 |
Mining Document Collections to Facilitate Accurate Approximate Entity Matching |
2009 |
VLDB |
5.8100413e-05 |
| 4,988 |
Incremental Maintenance of Length Normalized Indexes for Approximate String Matching |
2009 |
SIGMOD |
5.783959e-05 |
| 5,379 |
Scalable Ad-hoc Entity Extraction from Text Collections |
2008 |
VLDB |
5.5405989e-05 |
Semantically Similar Papers