【24h】

Word Sense Disambiguation Using Inductive Logic Programming

机译:使用归纳逻辑编程的词义消歧

获取原文
获取原文并翻译 | 示例

摘要

The identification of the correct sense of a word is necessary for many tasks in automatic natural language processing like machine translation, information retrieval, speech and text processing. Automatic Word Sense Disambiguation (WSD) is difficult and accuracies with state-of-the art methods are substantially lower than in other areas of text understanding like part-of-speech tagging. One shortcoming of these methods is that they do not utilize substantial sources of background knowledge, such as semantic taxonomies and dictionaries, which are now available in electronic form (the methods largely use shallow syntactic features). Empirical results from the use of Inductive Logic Programming (ILP) have repeatedly shown the ability of ILP systems to use diverse sources of background knowledge. In this paper we investigate the use of ILP for WSD in two different ways: (a) as a stand-alone constructor of models for WSD; and (b) to build interesting features, which can then be used by standard model-builders such as SVM. In our experiments we examine a monolingual WSD task using the 32 English verbs contained in the SENSEVAL-3 benchmark data; and a bilingual WSD task using 7 highly ambiguous verbs in machine translation from English to Portuguese. Background knowledge available is from eight sources that provide a wide range of syntactic and semantic information. For both WSD tasks, experimental results show that ILP-constructed models and models built using ILP-generated features have higher accuracies than those obtained using a state-of-the art feature-based technique equipped with shallow syntactic features. This suggests that the use of ILP with diverse sources of background knowledge can provide one way for making substantial progress in the field of automatic WSD.
机译:对于自动自然语言处理中的许多任务(例如机器翻译,信息检索,语音和文本处理),识别正确的词义是必不可少的。自动词义消除歧义(WSD)很难,并且最新技术的准确性要远低于诸如词性标记之类的其他文本理解领域。这些方法的一个缺点是它们没有利用大量的背景知识资源,例如语义分类法和字典,这些资源现在可以以电子形式获得(这些方法主要使用浅层语法特征)。使用归纳逻辑编程(ILP)得出的经验结果反复表明,ILP系统具有使用各种背景知识资源的能力。在本文中,我们以两种不同的方式研究了ILP在WSD中的使用:(a)作为WSD模型的独立构造函数; (b)构建有趣的功能,然后可以由标准模型构建器(例如SVM)使用。在我们的实验中,我们使用SENSEVAL-3基准数据中包含的32个英语动词检查了单语WSD任务;和双语的WSD任务,在从英语到葡萄牙语的机器翻译中使用7个高度歧义的动词。现有的背景知识来自八个来源,可提供广泛的句法和语义信息。对于这两个WSD任务,实验结果表明,使用ILP构造的模型和使用ILP生成的特征构建的模型的准确性要高于使用具有浅句法特征的基于最新特征的技术所获得的准确性。这表明将ILP与各种背景知识一起使用可以为在自动WSD领域取得实质性进展提供一种方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号