Corpus-based dialectology of less-resourced and functionally limited native languages is a developing field of linguistics. In this paper we discuss challenges of annotating dialect corpora for Turkic languages of Russia by the example of Mishar dialect of Tatar language. Peculiarities of grammatical variability in Mishar dialect are investigated from the point of view of automatic annotation and the search functionality of the corpus is described. The proposed methodology of annotation can be used when creating multilingual integrated resources and parallel corpora of closely related languages.
展开▼