首页> 外文期刊>Language Resources and Evaluation >Constructing and utilizing wordnets using statistical methods
【24h】

Constructing and utilizing wordnets using statistical methods

机译:使用统计方法构建和利用词网

获取原文
获取原文并翻译 | 示例
           

摘要

Lexical databases following the wordnet paradigm capture information about words, word senses, and their relationships. A large number of existing tools and datasets are based on the original WordNet, so extending the landscape of resources aligned with WordNet leads to great potential for interoperability and to substantial synergies. Wordnets are being compiled for a considerable number of languages, however most have yet to reach a comparable level of coverage. We propose a method for automatically producing such resources for new languages based on WordNet, and analyse the implications of this approach both from a linguistic perspective as well as by considering natural language processing tasks. Our approach takes advantage of the original WordNet in conjunction with translation dictionaries. A small set of training associations is used to learn a statistical model for predicting associations between terms and senses. The associations are represented using a variety of scores that take into account structural properties as well as semantic relatedness and corpus frequency information. Although the resulting wordnets are imperfect in terms of their quality and coverage of language-specific phenomena, we show that they constitute a cheap and suitable alternative for many applications, both for monolingual tasks as well as for cross-lingual interoperability. Apart from analysing the resources directly, we conducted tests on semantic relatedness assessment and cross-lingual text classification with very promising results.
机译:遵循词网范式的词法数据库捕获有关词,词义及其关系的信息。大量现有的工具和数据集都基于原始的WordNet,因此扩展与WordNet一致的资源格局将带来巨大的互操作性和实质性的协同作用。 Wordnet正在为许多语言进行编译,但是大多数语言尚未达到可比的覆盖水平。我们提出了一种基于WordNet的为新语言自动生成此类资源的方法,并从语言角度以及考虑自然语言处理任务的角度分析了这种方法的含义。我们的方法利用了原始WordNet和翻译词典的优势。一小套训练关联用于学习预测术语和感官之间关联的统计模型。使用考虑结构属性以及语义相关性和语料频率信息的各种分数来表示关联。尽管由此产生的词网在质量和对特定语言现象的覆盖方面并不完善,但我们证明,对于单语言任务以及跨语言互操作性,它们构成了许多应用程序的廉价且合适的替代品。除了直接分析资源外,我们还进行了语义相关性评估和跨语言文本分类的测试,结果非常令人满意。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号