首页> 外文会议>Advances in Natural Language Processing >Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus
【24h】

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus

机译:土耳其语资源:形态分析器,形态歧义消除器和Web语料库

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we propose a set of language resources for building Turkish language processing applications. Specifically, we present a finite-state implementation of a morphological parser, an averaged perceptron-based morphological disambiguator, and compilation of a web corpus. Turkish is an agglutinative language with a highly productive inflectional and derivational morphology. We present an implementation of a morphological parser based on two-level morphology. This parser is one of the most complete parsers for Turkish and it runs independent of any other external system such as PC-KIMMO in contrast to existing parsers. Due to complex phonology and morphology of Turkish, parsing introduces some ambiguous parses. We developed a morphological disambiguator with accuracy of about 98% using averaged perceptron algorithm. We also present our efforts to build a Turkish web corpus of about 423 million words.
机译:在本文中,我们提出了一套用于构建土耳其语处理应用程序的语言资源。具体来说,我们提出了一种形态分析器的有限状态实现,基于感知器的平均形态消歧器以及网络语料库的编译。土耳其语是一种凝集性语言,具有高产的变形和派生形态。我们提出一种基于两级形态学的形态学解析器的实现。该解析器是土耳其语最完整的解析器之一,与现有解析器相比,它独立于任何其他外部系统(例如PC-KIMMO)运行。由于土耳其语的语音和形态复杂,因此解析引入了一些歧义的解析。我们使用平均感知器算法开发了一种形态消除歧义词,准确度约为98%。我们还提出了建立约4.23亿个单词的土耳其语网络语料库的努力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号