Learning to Extract Form Labels

机译：学习提取表单标签

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we describe a new approach to extract element labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate information that is hidden behind form interfaces, such as hidden Web crawlers and metasearchers. However, given the wide variation in form layout, even within a well-defined domain, automatically extracting these labels is a challenging problem. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of a learning classifier ensemble to identify element-label mappings; and it applies a reconciliation step which leverages the classifier-derived mappings to boost extraction accuracy. We present a detailed experimental evaluation using over three thousand Web forms. Our results show that our approach is effective: it obtains significantly higher accuracy and is more robust to variability in form layout than previous label extraction techniques.

机译：在本文中，我们描述了一种从Web表单界面提取元素标签的新方法。具有这些标签是尝试检索和集成隐藏在表单界面（例如隐藏的Web爬网程序和元搜索器）后面的信息的多种技术的要求。但是，考虑到表单布局的巨大差异，即使在定义明确的域内，自动提取这些标签也是一个具有挑战性的问题。以前解决此问题的方法依赖于启发式方法和手动指定的提取规则，而我们的技术则利用学习分类器集成来识别元素标签映射。并应用调节步骤，该步骤利用分类器派生的映射来提高提取精度。我们提供了使用三千多种Web表单的详细实验评估。我们的结果表明，我们的方法是有效的：与以前的标签提取技术相比，该方法可获得更高的准确性，并且对表单布局的可变性更强健。

著录项

来源
《International conference on very large data bases;VLDB 2008》|2008年|683-693|共11页
会议地点
作者
Hoa Nguyen; Thanh Nguyen; Juliana Freire;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Learning to extract buildings from ultra-high-resolution drone images and noisy labels [J] . Ahmed Nahian, Bin Mahbub Riasad, Rahman Rashedur M. International journal of remote sensing . 2020,第21a22期

机译：学习从超高分辨率无人机图像和嘈杂标签中提取建筑物
2. Comparison of the pharmacokinetics of ticlopidine between administration of a combined fixed-dose tablet formulation of ticlopidine 250 mg/ginkgo extract 80 mg, and concomitant administration of ticlopidine 250-mg and ginkgo extract 80-mg tablets: an open-label, two-treatment, single-dose, randomized-sequence crossover study in healthy Korean male volunteers. [J] . Kim TE, Kim BH, Kim J Clinical therapeutics . 2009,第10期

机译：噻氯匹定的固定剂量片剂250 mg /银杏提取物80 mg的固定剂量组合给药与噻氯匹定250 mg和银杏提取物80 mg片剂的同时给药之间的噻氯匹定的药代动力学比较：开放标签，两次治疗，对健康的韩国男性志愿者进行的单剂量，随机序列交叉研究。
3. Leveraging Implicit Relative Labeling-Importance Information for Effective Multi-Label Learning [J] . Zhang Min-Ling, Zhang Qian-Wen, Fang Jun-Peng, IEEE Transactions on Knowledge and Data Engineering . 2021,第5期

机译：利用隐含的相对标签 - 重要信息，以实现有效的多标签学习
4. Learning to Extract Form Labels [C] . Hoa Nguyen, Thanh Nguyen, Juliana Freire International conference on very large data bases . 2008

机译：学习提取表格标签
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Label‐free Raman spectroscopic imaging to extract morphological and chemical information from a formalin‐fixed paraffin‐embedded rat colon tissue section [O] . Riana Gaifulina, Andrew Thomas Maher, Catherine Kendall, 2016

机译：无标记拉曼光谱成像可从福尔马林固定石蜡包埋的大鼠结肠组织切片中提取形态和化学信息
7. Learning to Extract Form Labels [O] . Hoa Nguyen 2009

机译：学习提取表单标签
8. Learning to Extract Gene-Protein Names from Weakly-Labeled Text [R] . 2008

机译：学习从弱标记文本中提取基因蛋白质名称

Learning to Extract Form Labels

摘要

著录项

相似文献

相关主题

期刊订阅