首页> 中文期刊> 《计算机应用与软件》 >基于N-gram的哈萨克语文本校对系统的设计与实现

基于N-gram的哈萨克语文本校对系统的设计与实现

         

摘要

在啥萨克语文本非词查错方面,归纳和总结查错方法,在一定规模的哈萨克语词库的支持下,利用哈萨克语的特点,用哈萨克语词干切分程序和哈萨克语的音节规则,从文本中找出非词错误,再用最小编辑距离算法提供最有可能的候选词.在哈萨克语文本真词查错部分,根据上下文信息,采用基于N-gram的语言模型,利用文本的局部连接同现概率三元语法模型来进行真词查错,再用基于编辑距离的模式匹配方法对真词错误提供纠错建议.实验结果表明,系统的查错与纠错效率较好,实验方案是可行的.%For the section of non-word errors checking in Kazakh text, on the basis of summarising and concluding the errors checking methods and supported by a certain size Kazakh lexicon, in the article we use the characteristics of Kazakh and the stem segmentation program and syllable rules of Kazakh language to find the non-word errors from the text, and then provide the most possible candidate word with minimum edit distance algorithm. In the section of real-word error checking in Kazakh text, according to context information and adopting N-gram based language model, we carry out real-word error checking by using ternary grammar model of local connection co-occurrence probability of the text, and then use the edit distance-based pattern matching method to provide error-correction suggestions to the errors of real words. Experimental results show that efficiency of error checking and error correction of this system is fairly good, the experiment scheme is feasible.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号