Overcoming Data Sparseness Problem in Statistical Corpus Based Sense Disambiguation

机译：基于统计语料库的语义歧义克服数据稀疏性问题

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of data sparseness is considered as a common problem for Statistical Corpus based Word Sense Disambiguation (WSD) approaches [1]. Usually large amounts of data are required in the corpus to guarantee that all senses of an ambiguous word are presented. However, this is not easily achieved, especially for words that do not occur frequently in the training corpus. On the other hand, for languages that do not have large amount of digitized resources, the sparseness problem is even worse. In this paper, we present an unsupervised framework that first learns the relationships between the ambiguous and related words to reveal their most suitable senses based on a proposed mathematical model and a set of bilingual resources, including a non-aligned Portuguese-Chinese training corpus, a dictionary, and a sense inventory. For senses not found in the learning phase, bilingual examples from the dictionary and Singular Value Decomposition (SVD) [2] techniques are applied to overcome the sparseness problem. All the senses found are converted into a set of rules and stored in the Word Sense database for later use in disambiguation and translation process. Preliminary experiment results show an improvement of learning more senses in the sparseness environment with the use of the mentioned strategies.

机译：数据稀疏性问题被认为是基于统计语料库的词义消歧（WSD）方法的常见问题[1]。通常，语料库中需要大量数据，以确保呈现出歧义词的所有含义。但是，这很难实现，特别是对于在训练语料库中不经常出现的单词。另一方面，对于没有大量数字化资源的语言，稀疏性问题更加严重。在本文中，我们提出了一个无监督的框架，该框架首先基于提出的数学模型和一套双语资源（包括不结盟的葡萄牙语-汉语训练语料库），学习歧义词和相关词之间的关系，以揭示最合适的含义。一本字典和一个意义清单。对于在学习阶段找不到的感觉，词典和奇异值分解（SVD）[2]技术中的双语示例可用于克服稀疏性问题。找到的所有感官都将转换为一组规则，并存储在Word Sense数据库中，以便以后在歧义和翻译过程中使用。初步实验结果表明，通过使用上述策略，可以在稀疏环境中学习更多感官。

著录项

来源
《The 10th International Conference on Enhancement and Promotion of Computational Methods in Engineering and Science (EPMESC X)》|2006年|P.1099-1104|共6页
会议地点 Sanya(CN);Sanya(CN)
作者
Francisco Oliveira; Fai Wong; Anna Ho; Yiping Li; Mingchui Dong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算数学的应用;
关键词
Word Sense Disambiguation; Data Sparseness problem; Singular Value Decomposition;

机译：词义消歧；数据稀疏性问题；奇异值分解;

相似文献

外文文献
中文文献
专利

1. Utilizing Corpus Statistics for Hindi Word Sense Disambiguation [J] . Satyendr Singh, Tanveer Siddiqui The international arab journal of information technology . 2015,第6Appa期

机译：利用语料库统计信息消除印地语词义歧义
2. A Neuro-Evolutionary Corpus-Based Method for Word Sense Disambiguation [J] . Azzini Antonia, da Costa Pereira Célia, Dragoni Mauro, Intelligent Systems, IEEE . 2012,第6期

机译：基于神经进化语料库的词义消歧方法
3. Combining Knowledge- and Corpus-based Word-Sense-Disambiguation Methods [J] . Montoyo A., Palomar M., Rigau G., The Journal of Artificial Intelligence Research . 2005,第12期

机译：结合基于知识和语料库的词义消歧方法
4. Overcoming Data Sparseness Problem in Statistical Corpus Based Sense Disambiguation [C] . 第十届工程与科学中的计算方法的提高与增强国际会议（EPMESC X）论文集 . 2006

机译：克服基于统计语料库的语义歧义消除数据稀疏性问题
5. Word sense disambiguation for statistical machine translation . [D] . Carpuat, Marine Jacinthe. 2008

机译：统计机器翻译的词义消歧。
6. Wireless Transmission Method for Large Data Based on Hierarchical Compressed Sensing and Sparse Decomposition [O] . Youtian Qie, Chuangbo Hao, Ping Song 2020

机译：基于分层压缩感和稀疏分解的大数据无线传输方法
7. Learning similarity-based word sense disambiguation from sparse data [O] . Karov, Yael, Edelman, Shimon 1996

机译：从稀疏数据中学习基于相似度的词义消歧

Overcoming Data Sparseness Problem in Statistical Corpus Based Sense Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅