Building Dialectological Corpora for Turkic Languages: Mishar Dialect of Tatar

Bulat Khakimov; Farid Salimov; Dariya Ramazanova

首页> 外文期刊>Procedia - Social and Behavioral Sciences >Building Dialectological Corpora for Turkic Languages: Mishar Dialect of Tatar

【24h】

Building Dialectological Corpora for Turkic Languages: Mishar Dialect of Tatar

机译：建立突厥语的方言语料库：塔塔尔族的米沙尔方言

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Corpus-based dialectology of less-resourced and functionally limited native languages is a developing field of linguistics. In this paper we discuss challenges of annotating dialect corpora for Turkic languages of Russia by the example of Mishar dialect of Tatar language. Peculiarities of grammatical variability in Mishar dialect are investigated from the point of view of automatic annotation and the search functionality of the corpus is described. The proposed methodology of annotation can be used when creating multilingual integrated resources and parallel corpora of closely related languages.

机译：资源较少且功能有限的本地语言的基于语料库的方言学是语言学的发展领域。在本文中，我们以Ta塔尔语的Mishar方言为例，讨论了为俄罗斯突厥语注释方言语料库的挑战。从自动注释的角度研究了米沙尔方言的语法变异的特殊性，并描述了语料库的搜索功能。当创建多语言集成资源和紧密相关语言的并行语料库时，可以使用建议的注释方法。

著录项

来源
《Procedia - Social and Behavioral Sciences》 |2015年第2期|共8页
作者
Bulat Khakimov; Farid Salimov; Dariya Ramazanova;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类社会科学现状及发展;
关键词
入库时间 2022-08-18 20:04:16

相似文献

外文文献
中文文献
专利

1. Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects [J] . KHALID ALMEMAN ACM transactions on Asian language information processing . 2018,第1期

机译：自动为阿拉伯语建立VoIP语音并行语料库
2. From a Smoking Gun to Spent Fuel: Principled Subsampling Methods for Building Big Language Data Corpora from Monitor Corpora [J] . Jacqueline Hettel Tidwell Data . 2019,第2期

机译：从抽烟的枪到消耗的燃料：从Monitor Corpora构建大语言数据语料库的原则性子采样方法
3. The Role of Language in (Re)creating Tatar Diaspora Identity: The Case of the Estonian Tatars [J] . Maarja Klaas Journal of Ethnology and Folkloristics . 2015,第1期

机译：语言在（重新）创造Ta人散居身份中的作用：以爱沙尼亚Ta人为例
4. Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model [C] . Rahma Boujelbane, Mariem Ellouze khemekhem, Siwar BenAyed, Second workshop on hybrid approaches to translation 2013 . 2013

机译：构建双语词典以创建突尼斯方言语料库并调整语言模型
5. Interaction, authenticity and spoken corpora: Building teaching materials for adult English language learners. [D] . Cunningham, Courtney. 2010

机译：互动性，真实性和语料库：为成人英语学习者制作教材。
6. Building Gold Standard Corpora for Medical Natural Language Processing Tasks [O] . Louise Deleger, Qi Li, Todd Lingren, 2012

机译：构建用于医学自然语言处理任务的金标准语料库

Building Dialectological Corpora for Turkic Languages: Mishar Dialect of Tatar

摘要

著录项

相似文献

相关主题

期刊订阅