首页> 外文会议> >Analyzing word embeddings and improving POS tagger of tigrinya
【24h】

Analyzing word embeddings and improving POS tagger of tigrinya

机译:分析单词嵌入并改进tigrinya的POS标签

获取原文

摘要

In this paper, we analyze word embeddings for a morphologically rich language, Tigrinya. Tigrinya is a Semitic language spoken natively in Eritrea and Ethiopia by over seven million people. The unique and complex morphology of Semitic languages, which includes Arabic, Amharic, and Hebrew, is commonly known as 'root and template pattern' morphology. This morphology generates a large number of inflected forms that often cause out-of-vocabulary (OOV) challenges in language processing. This problem is more challenging for low resource languages, such as Tigrinya, that offers very little support of annotated resources. Word embedding methods, given a large raw text corpus, form semantic and syntactic vector representation of words. Therefore, we construct a new text corpus and investigate the optimal settings for generating word vectors for Tigrinya. We also utilize word embeddings to improve the performance of a Tigrinya part-of-speech tagger created from a small tagged corpus.
机译:在本文中,我们分析了一种形态丰富的语言Tigrinya的词嵌入。提格里尼亚语是一种闪族语,在厄立特里亚和埃塞俄比亚本地有700万人使用。闪族语言的独特而复杂的形态包括阿拉伯语,阿姆哈拉语和希伯来语,通常被称为“根和模板模式”形态。这种形态会产生大量的词形变化,这些词形变化通常会在语言处理中引起词汇不足(OOV)挑战。对于资源不足的语言(例如Tigrinya)而言,此问题更具挑战性,该语言很少提供带注释的资源。在给定较大的原始文本语料库的情况下,词嵌入方法形成词的语义和句法矢量表示。因此,我们构建了一个新的文本语料库,并研究了为提格里尼亚语生成词向量的最佳设置。我们还利用词嵌入来提高从小标签语料库创建的Tigrinya词性标签器的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号