Back to papers
Efficient Approximate Entity Extraction with Edit Distance Constraints
Summary: Approximate dictionary matching for NER with edit-distance constraints; tolerates typographical errors beyond token-based similarity. Partitioning-based neighborhood generation with prefix pruning; scalable document processing yields large speedups.
(summarized by gpt-5-nano on Feb 09 2026)
- Paper ID
- 4162
- Venue
- SIGMOD
- Year
- 2009
- Pagerank
- 6.9503858e-05
- Overall Rank
- 3,578 | 75.11%
- DOI
-
-
Incoming Non-self Citations Over Time
Incoming Citations (Sorted by Pagerank)
Showing 14 of 14 citing papers.
| Rank |
Citing Paper |
Year |
Venue |
Pagerank |
| 2,592 |
Pass-Join: A Partition-based Method for Similarity Joins |
2012 |
VLDB |
8.4795761e-05 |
| 3,774 |
Efficient Exact Edit Similarity Query Processing with the Asymmetric Signature Scheme |
2011 |
SIGMOD |
6.7757301e-05 |
| 4,776 |
Exploiting Evidence from Unstructured Data to Enhance Master Data Management |
2012 |
VLDB |
5.9314064e-05 |
| 5,073 |
Faerie: Efficient Filtering Algorithms for Approximate Dictionary-based Entity Extraction |
2011 |
SIGMOD |
5.7177424e-05 |
| 5,151 |
String Similarity Measures and Joins with Synonyms |
2013 |
SIGMOD |
5.6609851e-05 |
| 5,291 |
Fast Subtrajectory Similarity Search in Road Networks under Weighted Edit Distance Constraints |
2020 |
VLDB |
5.5826473e-05 |
| 6,074 |
Pigeonring: A Principle for Faster Thresholded Similarity Search |
2019 |
VLDB |
5.2242306e-05 |
| 7,141 |
Efficient Error-tolerant Query Autocompletion |
2013 |
VLDB |
4.8197901e-05 |
| 7,474 |
Cardinality Estimation of Approximate Substring Queries using Deep Learning |
2022 |
VLDB |
4.7194345e-05 |
| 7,635 |
Allign: Aligning All-Pair Near-Duplicate Passages in Long Texts |
2021 |
SIGMOD |
4.6908858e-05 |
| 7,708 |
Efficient Top-k Algorithms for Approximate Substring Matching |
2013 |
SIGMOD |
4.6721808e-05 |
| 9,932 |
Local Filtering: Improving the Performance of Approximate Queries on String Collections |
2015 |
SIGMOD |
4.2500258e-05 |
| 9,933 |
Efficient and Effective KNN Sequence Search with Approximate n-grams |
2014 |
VLDB |
4.2500258e-05 |
| 12,086 |
RCSI: Scalable similarity search in thousand(s) of genomes |
2013 |
VLDB |
4.1945683e-05 |
Outgoing Citations (Sorted by Pagerank)
Showing 15 of 15 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank |
Cited Paper |
Year |
Venue |
Pagerank |
| 125 |
Approximate String Joins in a Database (Almost) for Free |
2001 |
VLDB |
0.00044847972 |
| 250 |
Efficient set joins on similarity predicates |
2004 |
SIGMOD |
0.00030661988 |
| 266 |
Efficient Exact Set-Similarity Joins |
2006 |
VLDB |
0.00029718727 |
| 1,202 |
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams |
2007 |
VLDB |
0.00013326298 |
| 1,234 |
Ed-Join: An Efficient Algorithm for Similarity Joins With Edit Distance Constraints |
2008 |
VLDB |
0.00013122499 |
| 1,533 |
Example-driven Design of Efficient Record Matching Queries |
2007 |
VLDB |
0.00011471971 |
| 2,193 |
Cost-Based Variable-Length-Gram Selection for String Collections to Support Approximate Queries Efficiently |
2008 |
SIGMOD |
9.3178557e-05 |
| 2,386 |
Leveraging Aggregate Constraints For Deduplication |
2007 |
SIGMOD |
8.9231895e-05 |
| 2,779 |
Hashed Samples: Selectivity Estimators For Set Similarity Selection Queries |
2008 |
VLDB |
8.1320575e-05 |
| 3,226 |
Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance |
2007 |
VLDB |
7.3433307e-05 |
| 3,267 |
Benchmarking Declarative Approximate Selection Predicates |
2007 |
SIGMOD |
7.3058429e-05 |
| 3,868 |
An Efficient Filter for Approximate Membership Checking |
2008 |
SIGMOD |
6.6822543e-05 |
| 4,438 |
Selectivity Estimation for Fuzzy String Predicates in Large Data Sets |
2005 |
VLDB |
6.1898903e-05 |
| 5,379 |
Scalable Ad-hoc Entity Extraction from Text Collections |
2008 |
VLDB |
5.5405989e-05 |
| 7,669 |
Incorporating String Transformations in Record Matching |
2008 |
SIGMOD |
4.6833751e-05 |
Semantically Similar Papers