...
首页> 外文期刊>Literary & linguistic computing >An all-words sense tagging method for resource-deficient languages
【24h】

An all-words sense tagging method for resource-deficient languages

机译:资源匮乏语言的全词义标记方法

获取原文
获取原文并翻译 | 示例

摘要

All-words sense tagging is the task of determining the correct senses of all content words in a given text. Many methods utilizing various language resources, such as a machine readable dictionary (MRD), sense tagged corpus, and WordNet, have been proposed for tagging senses to all words rather than a small number of sample words. However, sense tagging methods that require vast resources cannot be used for resource-deficient languages. The conventional sense tagging method for resource-deficient languages, which utilizes only an MRD, suffers from low recall and low precision because it determines senses only when a gloss word in the dictionary exactly matches a context word. In this study, we propose an all-words sense tagging method that is effective for resource-deficient languages in particular. It requires an MRD, which is the essential resource for all-words sense tagging, and a raw corpus, which is easily acquired and freely available. The proposed sense tagging method attempts to find semantically related context words based on the co-occurrence information extracted from the raw corpus and utilizes these words for tagging the senses of the target word. The experimental results of an evaluation of the proposed sense tagging algorithm on a Korean test corpus consisting of approximately 15 million words show that it can tag senses to all contents words automatically with high precision. Furthermore, we also show that a semantic concordancer can be developed based on the automatic sense tagged corpus.
机译:全词意义标记是确定给定文本中所有内容词的正确意义的任务。已经提出了许多利用各种语言资源的方法,例如将机器可读字典(MRD),有感觉标记的语料库和WordNet标记为所有单词而不是少量示例单词的感觉。但是,需要大量资源的感知标记方法不能用于资源匮乏的语言。仅利用MRD的用于资源匮乏语言的常规意义标记方法遭受低回忆和低精度的困扰,因为它仅在字典中的光泽词与上下文词完全匹配时才确定意义。在这项研究中,我们提出了一种全单词意义标记方法,该方法特别适用于资源匮乏的语言。它需要MRD(这是全单词感知标签的基本资源)和原始语料库,该语料库易于获取并且可以免费获得。提出的意义标记方法试图基于从原始语料库中提取的同现信息来找到语义相关的上下文词,并利用这些词来标记目标词的意义。在大约1500万个单词组成的韩语测试语料库上对提出的语义标记算法进行评估的实验结果表明,该算法可以自动,高精度地将语义标记到所有内容单词上。此外,我们还表明,可以基于自动意义标记语料库开发语义协调器。

著录项

  • 来源
    《Literary & linguistic computing 》 |2017年第3期| 672-688| 共17页
  • 作者单位

    Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea;

    Korea Univ, Res Inst Korean Studies, Anam Dong 5 Ga, Seoul, South Korea;

    Korea Univ, Dept Comp Sci & Engn, Seoul, South Korea;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号