【24h】

Automatically Training Form Classifiers

机译:自动培训表格分类器

获取原文

摘要

The state-of-the-art in domain-specific Web form discovery relies on supervised methods requiring substantial human effort in providing training examples, which limits their applicability in practice. This paper proposes an effective alternative to reduce the human effort: obtaining high-quality domain-specific training forms. In our approach, the only user input is the domain of interest; we use a search engine and a focused crawler to locate query forms which are fed as training data into supervised form classifiers. We tested this approach thoroughly, using thousands of real Web forms from six domains, including a representative subset of a publicly available form base to validate this approach. The results reported in this paper show that it is feasible to mitigate the demanding manual work required by some methods of the current state-of-the-art in form discovery, at the cost of a negligible loss in effectiveness.
机译:在特定于领域的网络形式中发现的最先进的网络形式发现依赖于在提供培训示例中需要大量人类努力的监督方法限制了其在实践中的适用性。本文提出了一种有效的替代方案来减少人力努力:获得高质量的域特定培训表格。在我们的方法中,唯一的用户输入是感兴趣的域;我们使用搜索引擎和一个聚焦爬虫来定位作为训练数据的查询表格,进入监督表单分类器。我们彻底测试了这种方法,使用六个域的数千个真实的网络形式,包括公开的表单基数的代表子集来验证这种方法。本文报道的结果表明,在效果损失的成本下,可以减轻当前最先进的方法所需的苛刻手动工作是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号