首页> 外文会议>International Symposium on INnovations in Intelligent SysTems and Applications >Graph-based lemmatization of Turkish words by using morphological similarity
【24h】

Graph-based lemmatization of Turkish words by using morphological similarity

机译:基于形态相似度的土耳其词基于图的词形化

获取原文

摘要

Lemmatization of the words is an important preprocess for Natural Language Processing (NLP) studies. Especially in language applications (such as part of speech tagging, spell-checking, and document clustering), selection of the right lemma with morphological features can provide better results. In this study, we present a new hybrid approach for Turkish inflected words by using morphological similarity based graph models which is recently getting popular in lemmatization. For this aim, a novel similarity function for Turkish is developed to connect the similar word forms. The proposed model is trained and tested by a double-checked Turkish lemmatization dataset. Then, empirical results are compared with ones of Zemberek which is the most used Turkish lemmatization tool.
机译:单词的合法化是自然语言处理(NLP)研究的重要预处理。尤其是在语言应用程序中(例如语音标记,拼写检查和文档聚类的一部分),选择具有形态特征的正确引理可以提供更好的结果。在这项研究中,我们通过使用基于形态相似性的图模型,提出了一种新的土耳其语变形词混合方法,该方法最近在词形化中很受欢迎。为此,开发了一种新颖的土耳其语相似功能,以连接相似的单词形式。所提出的模型是通过双重检查的土耳其词条化数据集进行训练和测试的。然后,将实证结果与最常用的土耳其语词化工具Zemberek进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号