【24h】

A biscriptual morphological transducer for Crimean Tatar

机译:克里米亚Ta语的双态形态转换器

获取原文

摘要

This paper describes a weighted finite-state morphological transducer for Crimean Tatar able to analyse and generate in both Latin and Cyrillic orthographies. This transducer was developed by a team including a community member and language expert, a field linguist who works with the community, a Turkologist with computational linguistics expertise, and an experienced computational linguist with Turkic expertise. Dealing with two orthographic systems in the same transducer is challenging as they employ different strategies to deal with the spelling of loan words and encode the full range of the language's phonemes and their interaction. We develop the core transducer using the Latin orthography and then design a separate transliteration transducer to map the surface forms to Cyrillic. To help control the non-determinism in the orthographic mapping, we use weights to prioritise forms seen in the corpus. We perform an evaluation of all components of the system, finding an accuracy above 90% for morphological analysis and near 90% for orthographic conversion. This comprises the state of the art for Crimean Tatar morphological modelling, and. to our knowledge, is the first biscriptual single morphological transducer for any language.
机译:本文介绍了克里米亚Ta语的加权有限态形态传感器,它能够在拉丁文和西里尔文拼写法中进行分析和生成。该转换器是由一个团队开发的,该团队包括社区成员和语言专家,与社区合作的现场语言学家,具有计算语言学专业知识的土库曼斯坦学者和经验丰富的具有突厥专业知识的计算语言学家。在同一个换能器中处理两个正交系统具有挑战性,因为它们采用不同的策略来处理外来词的拼写,并对语言的所有音素及其交互进行编码。我们使用拉丁文拼字法开发核心换能器,然后设计一个单独的音译换能器以将表面形式映射到西里尔文。为了帮助控制正交映射中的不确定性,我们使用权重对语料库中出现的形式进行优先排序。我们对系统的所有组件进行评估,发现形态分析的准确性高于90%,而正字转换的准确性则接近90%。这包括克里米亚Ta人形态建模的最新技术。据我们所知,它是第一种适用于任何语言的双本单形词转换器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号