首页> 外文会议>ACM SIGMOD international conference on management of data >ONDUX: On-Demand Unsupervised Learning for Information Extraction
【24h】

ONDUX: On-Demand Unsupervised Learning for Information Extraction

机译:ONDUX:按需无监督的信息提取学习

获取原文

摘要

Information extraction by text segmentation (IETS) applies to cases in which data values of interest are organized in implicit semi-structured records available in textual sources (e.g. postal addresses, bibliographic information, ads). It is an important practical problem that has been frequently addressed in the recent literature. In this paper we introduce ONDUX (On Demand Unsupervised Information Extraction), a new unsupervised probabilistic approach for IETS. As other unsupervised IETS approaches, ONDUX relies on information available on pre-existing data to associate segments in the input string with attributes of a given domain. Unlike other approaches, we rely on very effective matching strategies instead of explicit learning strategies. The effectiveness of this matching strategy is also exploited to disam-biguate the extraction of certain attributes through a reinforcement step that explores sequencing and positioning of attribute values directly learned on-demand from test data, with no previous human-driven training, a feature unique to ONDUX. This assigns to ONDUX a high degree of flexibility and results in superior effectiveness, as demonstrated by the experimental evaluation we report with textual sources from different domains, in which ONDUX is compared with a state-of-art IETS approach.
机译:文本分段(IET)的信息提取适用于在文本来源中可用的隐含半结构化记录中举办利息数据值的情况(例如邮政地址,书目信息,广告)。这是在最近的文献中经常解决的重要实际问题。在本文中,我们介绍了ONDUX(按需无需无需信息提取),这是一种新的IET概率方法。作为其他无监督的IET方法,ONDUX依赖于预先存在的数据可用的信息,以将输入字符串中的段与给定域的属性相关联。与其他方法不同,我们依靠非常有效的匹配策略而不是明确的学习策略。这种匹配策略的有效性也被利用来通过探索从测试数据直接学习的属性值的钢筋和定位的钢筋步骤来讨论对某些属性的提取 - 没有先前的人机训练,这是一个独特的特征ondux。这分配给ondux高度的灵活性,并导致卓越的有效性,正如我们用来自不同域的文本源报告的实验评估所示,其中ondux与最先进的IET方法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号