首页> 外文会议>IEEE International Conference on Data Mining Workshops >Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages
【24h】

Automatic Learning Common Definitional Patterns from Multi-domain Wikipedia Pages

机译:从多域维基百科页面自动学习常见的定义模式

获取原文
获取外文期刊封面目录资料

摘要

Automatic definition extraction has attracted wide interest in NLP domain and knowledge-based applications. One primary task of definition extraction is mining patterns from definitional sentences. Existing extraction methods of definitional patterns, either focus on manual extraction by intuition or observation, or aim to mine intricate definitional patterns by automatic extraction methods. The manual method requires large human resources to identify the definitional patterns because of diverse lexico-syntactic structures. It inevitable suffers poor behavior especially the extraction from cross-domain corpora. The latter method mainly considers the precision in definition extraction, which is at the cost of decreasing the recall of definitions. Both of them are unsuitable for cross-domain definition extraction. To address those issues, this paper proposes a solution to perform the automatic extraction of definitional patterns from multi-domain definitional sentences of Wikipedia. Our method FIND-SS is modified based on FIND-S algorithm and solves the definition extraction problems of cross-domain corpora. Find-SS adopts a "the more similar the higher priority" scheme to improve the learning performance. It can accommodate some noisy information and does not require any pattern seeds for pattern learning. The experimental results indicate that our scenario is significantly superior to previous method.
机译:自动定义提取在NLP域和基于知识的应用程序中引起了广泛的兴趣。定义提取的一项主要任务是从定义语句中挖掘模式。现有的定义模式提取方法,要么专注于通过直觉或观察进行手动提取,要么旨在通过自动提取方法来挖掘复杂的定义模式。由于多种多样的词汇-句法结构,手动方法需要大量的人力资源来确定定义模式。它不可避免地遭受不良行为,特别是从跨域语料库中提取。后一种方法主要考虑定义提取的精度,其代价是减少了定义的调用。它们都不适合跨域定义提取。为了解决这些问题,本文提出了一种从Wikipedia的多域定义语句中自动提取定义模式的解决方案。我们的方法FIND-SS是在FIND-S算法的基础上进行修改的,解决了跨域语料库的定义提取问题。 Find-SS采用“越相似,优先级越高”的方案来提高学习性能。它可以容纳一些嘈杂的信息,并且不需要任何模式种子即可进行模式学习。实验结果表明,我们的方案明显优于以前的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号