...
首页> 外文期刊>International Journal on Document Analysis and Recognition >Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web
【24h】

Unsupervised information extraction from unstructured, ungrammatical data sources on the World Wide Web

机译:从万维网上的非结构化,非语法化数据源中无监督地提取信息

获取原文
获取原文并翻译 | 示例
           

摘要

Information extraction from unstructured, ungrammatical data such as classified listings is difficult because traditional structural and grammatical extraction methods do not apply. Previous work has exploited reference sets to aid such extraction, but it did so using supervised machine learning. In this paper, we present an unsupervised approach that both selects the relevant reference set(s) automatically and then uses it for unsupervised extraction. We validate our approach with experimental results that show our unsupervised extraction is competitive with supervised machine learning approaches, including the previous supervised approach that exploits reference sets.
机译:由于不应用传统的结构和语法提取方法,因此很难从非结构化,非语法数据(例如分类列表)中提取信息。先前的工作已经利用参考集来辅助这种提取,但是它是使用监督机器学习来完成的。在本文中,我们提出了一种无监督方法,该方法会自动选择相关的参考集,然后将其用于无监督提取。我们通过实验结果验证了我们的方法,该结果表明我们的无监督抽取与有监督的机器学习方法(包括以前利用参考集的有监督的方法)相比具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号