Analyzing word embeddings and improving POS tagger of tigrinya

机译：分析单词嵌入并改进tigrinya的POS标签

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we analyze word embeddings for a morphologically rich language, Tigrinya. Tigrinya is a Semitic language spoken natively in Eritrea and Ethiopia by over seven million people. The unique and complex morphology of Semitic languages, which includes Arabic, Amharic, and Hebrew, is commonly known as 'root and template pattern' morphology. This morphology generates a large number of inflected forms that often cause out-of-vocabulary (OOV) challenges in language processing. This problem is more challenging for low resource languages, such as Tigrinya, that offers very little support of annotated resources. Word embedding methods, given a large raw text corpus, form semantic and syntactic vector representation of words. Therefore, we construct a new text corpus and investigate the optimal settings for generating word vectors for Tigrinya. We also utilize word embeddings to improve the performance of a Tigrinya part-of-speech tagger created from a small tagged corpus.

机译：在本文中，我们分析了一种形态丰富的语言Tigrinya的词嵌入。提格里尼亚语是一种闪族语，在厄立特里亚和埃塞俄比亚本地有700万人使用。闪族语言的独特而复杂的形态包括阿拉伯语，阿姆哈拉语和希伯来语，通常被称为“根和模板模式”形态。这种形态会产生大量的词形变化，这些词形变化通常会在语言处理中引起词汇不足（OOV）挑战。对于资源不足的语言（例如Tigrinya）而言，此问题更具挑战性，该语言很少提供带注释的资源。在给定较大的原始文本语料库的情况下，词嵌入方法形成词的语义和句法矢量表示。因此，我们构建了一个新的文本语料库，并研究了为提格里尼亚语生成词向量的最佳设置。我们还利用词嵌入来提高从小标签语料库创建的Tigrinya词性标签器的性能。

著录项

来源
《》|2017年|115-118|共4页
会议地点 Singapore(SG)
作者
Yemane Tedla; Kazuhide Yamamoto;
展开▼
作者单位

Nagaoka University of Technology Natural Language Processing Lab Nagaoka Japan;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Semantics; Syntactics; Morphology; Tagging; Task analysis; Education;

机译：语义学句法；形态学;标记；任务分析；教育;

相似文献

外文文献
中文文献
专利

1. Multilingual POS tagging by a composite deep architecture based on character-level features and on-the-fly enriched Word Embeddings [J] . Marco Pota, Fiammetta Marulli, Massimo Esposito, Knowledge-Based Systems . 2019,第JANa15期

机译：通过基于字符级功能和动态丰富单词嵌入的复合深度架构的多语言POS标签
2. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese [J] . Erick R Fonseca, Jo#227, o Lu#237, Brazilian Computer Society. Journal . 2015,第1期

机译：评估葡萄牙语中词性标记的词嵌入和修订的语料库
3. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese [J] . Erick R Fonseca, João Luís G Rosa, Sandra Maria Aluísio Journal of the Brazilian Computer Society . 2015,第1期

机译：评估葡萄牙语中词性标记的词嵌入和修订语料库
4. Analyzing word embeddings and improving POS tagger of tigrinya [C] . Yemane Tedla, Kazuhide Yamamoto International Conference on Asian Language Processing . 2017

机译：分析TIGRINYA的WORD EMBEDDINGS和改进POS标记
5. Improved GloVe Word Embedding Using Linear Weighting Scheme for Word Similarity Tasks [D] . Lu, Qinglan. 2021

机译：使用线性加权方案进行改进的手套单词嵌入单词相似性任务
6. BioWordVec improving biomedical word embeddings with subword information and MeSH [O] . Yijia Zhang, Qingyu Chen, Zhihao Yang, 2019

机译：BioWordVec通过子词信息和MeSH改善生物医学词嵌入
7. Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [O] . Awet Fesseha, Shengwu Xiong, Eshete Derb Emiru, 2021

机译：基于卷积神经网络的文本分类和低资源语言的Word嵌入：Tigrinya

Analyzing word embeddings and improving POS tagger of tigrinya

摘要

著录项

相似文献

相关主题

期刊订阅