Automatic segmentation of text into structured records
Summary: Automatic segmentation of unformatted text into structured records; datamold learns structure from a small seed set. Extends HMMs with multi-source cues (sequence, length, vocabulary, external dictionary) for robust address extraction; 90% Asian, 99% US accuracy, beating rule-based IE. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
Incoming Citations (Sorted by Pagerank)
Showing 16 of 16 citing papers.
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 4 of 4 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 67 | The Merge/Purge Problem for Large Databases | 1995 | SIGMOD | 0.00061348205 |
| 385 | NoDoSE - A Tool for Semi-Automatically Extracting Structured and Semistructured Data from Text Documents. | 1998 | SIGMOD | 0.00024795739 |
| 1,132 | Building light-weight wrappers for legacy Web data-sources using W4F | 1999 | VLDB | 0.00013777657 |
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 7,826 | The Smallest Extraction Problem | 2021 | VLDB | 4.6416742e-05 |
| 5,399 | Joint Unsupervised Structure Discovery and Information Extraction | 2011 | SIGMOD | 5.5291067e-05 |
| 7,912 | Mining Quality Phrases from Massive Text Corpora | 2015 | SIGMOD | 4.6183486e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 6,534 | Automatic Rule Refinement for Information Extraction | 2010 | VLDB | 5.0244622e-05 |
| 12,230 | ONDUX: On-Demand Unsupervised Learning for Information Extraction | 2010 | SIGMOD | 4.1945683e-05 |
| 4,092 | Structured Annotations of Web Queries | 2010 | SIGMOD | 6.4561959e-05 |
| 2,005 | Record-Boundary Discovery in Web Documents | 1999 | SIGMOD | 9.8112591e-05 |
| 11,775 | Building Structured Databases of Factual Knowledge from Massive Text Corpora | 2017 | SIGMOD | 4.1945683e-05 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |