以语音合成系统文本分析模块中的韵律边界自动划分技术为背景,重点研究了维吾尔语词性自动标注技术:首先根据应用领域的特点确定词性的种类及其判定规则,筛选文本句子并对其进行手动词性标注,然后通过统计获得了词性概率表和词性对照表,最后采用基于HMM模型的二元文法来实现维吾尔语词性自动标注.在实验中,为了验证算法的有效性,筛选了10000条句子作为训练样本,另选用500条句子作为测试样本.实验结果表明,该研究思路的可行性和有效性.%An automatic tagging method for POS (part of speech) of Uyghur sentences has been elaborately studied in this paper by taking as the background the automatic division technology on prosodic levels boundary in text analysis module of speech synthesis system. First, according to the characteristics of specific application field, the categories of the POS and their decision rules are confirmed, candidate text sentences are screened out and are manually tagged, and then the probability tables and reference tables of POS are elicited from statistics, at last, the automatic POS tagging on Uyghur is implemented through adopting bigram model on the basis of HMM model. In order to approve the validity of the method presented in this paper, large scale text corpus over 10,000 sentences are selected for the training sample and extra 500 sentences are used as the testing sample in the experiment. Test results show that the study conception used in this paper is feasible and valid.
展开▼