首页> 外文期刊>Information Processing & Management >Synset expansion on translation graph for automatic wordnet construction
【24h】

Synset expansion on translation graph for automatic wordnet construction

机译:用于自动词网构建的翻译图上的同义词集扩展

获取原文
获取原文并翻译 | 示例
       

摘要

Research on clustering algorithms in synonymy graphs of a single language yields promising results, however, this idea is not yet explored in a multilingual setting. Nevertheless, moving the problem to a multilingual translation graph enables the use of more clues and techniques not possible in a monolingual synonymy graph. This article explores the potential of sense induction methods in a massively multilingual translation graph. For this purpose, the performance of graph clustering methods in synset detection are investigated. In the context of translation graphs, the use of existing Wordnets in different languages is an important clue for synset detection which cannot be utilized in a monolingual setting. Casting the problem into an unsupervised synset expansion task rather than a clustering or community detection task improves the results substantially. Furthermore, instead of a greedy unsupervised expansion algorithm guided by heuristics, we devise a supervised learning algorithm able to learn synset expansion patterns from the words in existing Wordnets to achieve superior results. As the training data is formed of already existing Wordnets, as opposed to previous work, manual labeling is not required. To evaluate our methods, Wordnets for Slovenian, Persian, German and Russian are built from scratch and compared to their manually built Wordnets or labeled test-sets. Results reveal a clear improvement over 2 state-of-the-art algorithms targeting massively multilingual Wordnets and competitive results with Wordnet construction methods targeting a single language. The system is able to produce Wordnets from scratch with a Wordnet base concept coverage ranging from 20% to 88% for 51 languages and expands existing Wordnets up to 30%.
机译:对单一语言的同义词图中的聚类算法进行研究会产生令人鼓舞的结果,但是,这种思想尚未在多语言环境中得到探索。然而,将问题移至多语言翻译图可以使用单语同义词图中不可能的更多线索和技术。本文探讨了大规模多语言翻译图中的感应归纳方法的潜力。为此,研究了图集聚类方法在同义词集检测中的性能。在翻译图的上下文中,使用不同语言的现有Wordnet是进行同义词集检测的重要线索,而同义词集检测无法在单语言环境中使用。将问题投放到无监督的同义词集扩展任务而不是聚类或社区检测任务中,可以大大改善结果。此外,我们设计了一种能够从现有Wordnet中的单词学习同义词集扩展模式的监督学习算法,而不是由启发式算法指导的贪婪无监督扩展算法。与以前的工作相反,由于培训数据是由现有的Wordnet组成的,因此不需要手动标记。为了评估我们的方法,从零开始构建了斯洛文尼亚语,波斯语,德语和俄语的Wordnet,并将其与手动构建的Wordnet或标记的测试集进行了比较。结果表明,针对大规模多语言Wordnet的2种最新算法和针对单一语言的Wordnet构造方法的竞争结果明显改善。该系统能够从零开始产生Wordnet,其Wordnet基本概念覆盖范围为51种语言的20%到88%,并将现有Wordnet扩展到30%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号