首页> 中文期刊>计算机工程 >藏文自动分词中未登录词处理方法研究

藏文自动分词中未登录词处理方法研究

     

摘要

藏文中后接成份出现频率较高,分词中未登录词的后缀单切现象会影响分词的正确率,为此,采用词(语素)+缀归并的方法,将藏文后接成份与前一词(语素)归并为一个切分单位输出.针对藏文中大量人名、地名、单位名等未登录词在分词时出现的碎片切分现象,使用分词碎片整合方法,将多次出现的词条碎片整合为一个切分单位输出.实验结果表明,2种方法能提高藏文自动分词的识别正确率.%In Tibetan, followed ingredients appear with high frequency. Suffix-cut appears in the participle word. It affects the accuracy of the word. By applying word(morpheme) + suffix method, Tibetan suffix and prefix word(morpheme) are grouped into a slitting unit output. In response to a large number of names, place names, unit names, and so on appear in Tibetan, which are not included in dictionaries, debris splitting phenomena appears in the word. Aiming at the problem, it uses word fragments consolidation method. Multiple occurrences of the term debris are to be grouped into a slit unit output. Experimental results show that two methods can improve the accuracy of Tibetan word segmentation.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号