首页> 外文期刊>Computer Science & Information Technology >Comparison of Turkish Word Representations Trained on Different Morphological Forms
【24h】

Comparison of Turkish Word Representations Trained on Different Morphological Forms

机译:不同形态形式训练的土耳其语字形的比较

获取原文

摘要

Increased popularity of different text representations has also brought many improvements inNatural Language Processing (NLP) tasks. Without need of supervised data, embeddingstrained on large corpora provide us meaningful relations to be used on different NLP tasks.Even though training these vectors is relatively easy with recent methods, information gainedfrom the data heavily depends on the structure of the corpus language. Since the popularlyresearched languages have a similar morphological structure, problems occurring formorphologically rich languages are mainly disregarded in studies. For morphologically richlanguages, context-free word vectors ignore morphological structure of languages. In thisstudy, we prepared texts in morphologically different forms in a morphologically richlanguage, Turkish, and compared the results on different intrinsic and extrinsic tasks. To seethe effect of morphological structure, we trained word2vec model on texts which lemma andsuffixes are treated differently. We also trained subword model fastText and compared theembeddings on word analogy, text classification, sentimental analysis, and language modeltasks.
机译:不同文本表示的流行程度增加了许多改进的语言处理(NLP)任务。无需监督数据,在大型计算机上嵌入过度,提供了在不同的NLP任务上使用的有意义的关系。即使训练这些向量与最近的方法相对容易,数据已经大量取决于语料库语言的结构。由于流行的道教语言具有类似的形态结构,因此在研究中主要忽略了甲术上丰富的语言的问题。对于形态学上的Richlangumages,无背景的单词矢量忽略了语言的形态学结构。在鉴于,我们在形态学上的形态学上的形态学,土耳其语,土耳其语,并将结果与​​不同的内在和外在任务进行了比较。为了对形态结构的影响,我们在不同地对待雷姆玛和描梢的文本上培训了Word2Vec模型。我们还培训了次字型号FastText,并比较了字样,文本分类,感伤分析和语言展示的Word类比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号