首页> 外文期刊>Computer speech and language >Improving selection of synsets from WordNet for domain-specific word sense disambiguation
【24h】

Improving selection of synsets from WordNet for domain-specific word sense disambiguation

机译:改进从WordNet的同义词集的选择,以消除特定领域的单词歧义

获取原文
获取原文并翻译 | 示例

摘要

Word Sense Disambiguation (WSD) is a fundamental task useful for Information Retrieval, Information Extraction, web search, and indexing, among others. In the literature there exist several works dedicated to generic WSD task, but in recent years domain-specific WSD has attracted the attention of several researchers. In this sense, this paper describes an approach for domain-specific WSD by selecting the predominant sense (synset from WordNet) of ambiguous words. To achieve it the method uses two corpora: the domain-specific test corpus (containing target ambiguous words) and a domain-specific auxiliary corpus (obtained by using relevant words from the domain-specific test corpus). The approach has four main stages: (1) auxiliary corpus generation; (2) related features extraction (from the auxiliary corpus); (3) test features extraction (from the test corpus); and (4) features integration. The proposed approach has been tested on domain-specific corpora (Sports and Finance) and on one balanced corpus, BNC. Even though our WSD approach showed some limitations when dealing with the general-domain corpus, the obtained results for domain-specific corpora, which are our main interest, were better than those reported in previous works.
机译:词义消歧(WSD)是一项基本任务,可用于信息检索,信息提取,Web搜索和索引等。在文献中,有几篇专门针对通用WSD任务的著作,但是近年来,针对特定领域的WSD引起了一些研究人员的关注。从这个意义上讲,本文通过选择歧义词的主要含义(来自WordNet的同义词)来描述一种针对特定领域的WSD的方法。为此,该方法使用两个语料库:特定领域的测试语料库(包含目标歧义词)和特定领域的辅助语料库(通过使用来自特定领域的测试语料库的相关词获得)。该方法有四个主要阶段:(1)辅助语料库的生成; (2)从辅助语料库中提取相关特征; (3)测试特征提取(从测试语料库中提取); (4)功能集成。建议的方法已在特定领域的语料库(体育和金融)和一个平衡语料库BNC上进行了测试。尽管我们的WSD方法在处理通用域语料库时表现出一定的局限性,但我们主要关注的特定领域语料库的结果却比以前的工作报道的要好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号