首页> 外文期刊>ACM transactions on Asian language information processing >Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages
【24h】

Morphological Segmentation to Improve Crosslingual Word Embeddings for Low Resource Languages

机译:改善低资源语言的跨性词嵌入的形态分割

获取原文
获取原文并翻译 | 示例
       

摘要

Crosslingual word embeddings developed from multiple parallel corpora help in understanding the relationships between languages and improving the prediction quality of machine translation. However, in low resource languages with complex and agglutinative morphologies, inducing good-quality crosslingual embeddings becomes challenging due to the problem of complex morphological forms and rare words. This is true even for languages that share common linguistic structure. In our work, we have shown that performing a simple morphological segmentation upon the corpora prior to the generation of crosslingual word embeddings for both roots and suffixes greatly improves the prediction quality and captures semantic similarities more effectively. To exhibit this, we have chosen two related languages: Telugu and Kannada of the Dravidian language family. We have also tested our method upon a widely spoken North Indian language, Hindi, belonging to the Indo-European language family, and have observed encouraging results.
机译:从多个平行语料库中开发的Crosslingual Word Embeddings帮助了解语言之间的关系并提高机器翻译预测质量。然而,由于复杂和凝集形态的低资源语言,由于复杂的形态形式和稀有词语的问题,诱导良好质量的Crosslingual Embeddings变得挑战。即使对于共享常见语言结构的语言,这也是如此。在我们的工作中,我们已经表明,在生成奇妙的单词嵌入之前对TOORS和后缀进行了简单的形态细分,大大提高了预测质量,更有效地捕获语义相似之处。为了展示这一点,我们选择了两种相关语言:Telugu和Dravidian语言家庭的kannada。我们还在北方印度语文,印地语的广泛口语,属于印度欧洲语言家庭,并观察到令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号