首页> 外文会议>Workshop on multilingual and cross-lingual methods in NLP 2016 >Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance
【24h】

Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance

机译:使用词嵌入语言差异(WELD)作为语言距离的定量度量,比较五十种自然语言和十二种遗传语言

获取原文
获取原文并翻译 | 示例

摘要

We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all languages, interestingly in many cases languages within the same family cluster together. In addition to natural languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between languages, with applications in languages classification, genre identification, dialect identification, and evaluation of translations.
机译:我们引入了一种基于词嵌入的语言之间距离的新度量,称为词嵌入语言分歧(WELD)。 WELD被定义为语言之间单词的统一相似度分布之间的差异。使用这种方法,我们对五十种自然语言和十二种遗传语言进行了语言比较。我们的自然语言数据集是来自句子翻译的平行语料库的集合,该语料库来自五十种语言的圣经译本,涵盖多种语言家族。尽管我们使用并行语料库,这保证了所有语言的内容相同,但有趣的是,在许多情况下,同一族中的语言会聚在一起。除了自然语言,我们还对12种不同生物(4种植物,6种动物和2个人类受试者)的基因组中的编码区进行语言比较。我们的结果证实了人类/动物与植物的遗传语言模型之间存在显着的高层差异。所提出的方法是朝着定义语言之间相似性的定量度量迈出的一步,并将其应用于语言分类,体裁识别,方言识别和翻译评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号