Learning to Extract Form Labels
Summary: Presents a learning classifier ensemble to identify element–label mappings for Web forms; adds a reconciliation step to boost extraction accuracy. Evaluated on over 3,000 forms, it yields higher accuracy and robustness to layout variability than prior heuristic approaches. (summarized by gpt-5-nano on Feb 09 2026)
Incoming Non-self Citations Over Time
Authors
- 1. Hoa Nguyen
- 2. Thanh Nguyen
- 3. Juliana Freire
Incoming Citations (Sorted by Pagerank)
Showing 2 of 2 citing papers.
| Rank | Citing Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 6,133 | DIADEM: Thousands of Websites to a Single Database | 2014 | VLDB | 5.1954702e-05 |
| 12,240 | Creating and Exploring Web Form Repositories | 2010 | SIGMOD | 4.1945683e-05 |
Previous
Page 1 / 1
Next
Outgoing Citations (Sorted by Pagerank)
Showing 6 of 6 cited papers.
Citations counted here include only citations to other VLDB/SIGMOD/CIDR/PODS papers in this database.
| Rank | Cited Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 672 | An Interactive Clustering-based Approach to Integrating Source Query Interfaces on the Deep Web | 2004 | SIGMOD | 0.00018355746 |
| 902 | Statistical Schema Matching across Web Query Interfaces | 2003 | SIGMOD | 0.00015486247 |
| 1,147 | Web-scale Data Integration: You can only afford to Pay As You Go | 2007 | CIDR | 0.00013677658 |
| 2,362 | Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax | 2004 | SIGMOD | 8.9582251e-05 |
| 3,724 | Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web | 2005 | CIDR | 6.8173288e-05 |
Previous
Page 1 / 1
Next
Semantically Similar Papers
| Overall Rank | Paper | Year | Venue | Pagerank |
|---|---|---|---|---|
| 2,633 | Schema Extraction for Tabular Data on the Web | 2013 | VLDB | 8.4063569e-05 |
| 234 | Crawling the Hidden Web | 2001 | VLDB | 0.00032018108 |
| 1,317 | Harvesting Relational Tables from Lists on the Web | 2009 | VLDB | 0.00012625853 |
| 2,362 | Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax | 2004 | SIGMOD | 8.9582251e-05 |
| 3,285 | Using the Structure of Web Sites for Automatic Segmentation of Tables | 2004 | SIGMOD | 7.2759001e-05 |
| 587 | Extracting Structured Data from Web Pages | 2003 | SIGMOD | 0.00019648348 |
| 11,256 | Self-Training for Label-Efficient Information Extraction from Semi-Structured Web-Pages | 2023 | VLDB | 4.1945683e-05 |
| 5,774 | A Hierarchical Approach to Model Web Query Interfaces for Web Source Integration | 2009 | VLDB | 5.3313642e-05 |
| 7,397 | A Probabilistic Approach for Automatically Filling Form-Based Web Interfaces | 2011 | VLDB | 4.7417648e-05 |
| 7,422 | Meaningful Labeling of Integrated Query Interfaces | 2006 | VLDB | 4.7343948e-05 |