基于互信息改进算法和t-测试差的壮文分词算法研究

覃俊; 林叶川; 易云飞

首页> 中文期刊>中南民族大学学报（自然科学版） >基于互信息改进算法和t-测试差的壮文分词算法研究

基于互信息改进算法和t-测试差的壮文分词算法研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The traditional method of Zhuangwen word segmentation is to use the space between words as a separation mark . But in most cases , the word segmentation method will destroy multiple words association combination of semantic words which express the complete and independent semantic information .For the first time we use the mutual information to improve algorithm MI k and t-test difference in Zhuangwen text word segmentation that based on the use of mutual information MI method to measure the degree of correlation between adjacent words , and combine with the two in the evaluation of adjacent words'static binding ability and dynamic binding ability, a TD-MIk hybrid algorithm based on the MIk and t-test difference is proposed .The segmentation effects of MI k , t-test difference and TD-MIk hybrid algorithm are compared .We use the text set on the People′s network in Zhuangwen as a training and test corpus to do the experiments .The experimental results show that the three segmentation methods can extract the semantic words in text accurately and efficiently ,and TD-MIk hybrid algorithm has the highest accuracy of word segmentation .%针对传统的壮文分词方法将单词之间的空格作为分隔标志,在多数情况下,会破坏多个单词关联组合而成的语义词所要表达的完整且独立的语义信息,在借鉴前人使用互信息MI方法来度量相邻单词间关联程度的基础上,首次采用互信息改进算法MIk和t-测试差对壮文文本分词,并结合两者在评价相邻单词间的静态结合能力和动态结合能力的各自优势,提出了一种MIk和t-测试差相结合的TD-MIk混合算法对壮文文本分词,并对互信息改进算法MIk、t-测试差、TD-MIk混合算法三种方法的分词效果进行了比较.使用人民网壮文版上的文本集作为训练及测试语料进行了实验,结果表明:三种分词方法都能够较准确而有效地提取文本中的语义词,并且TD-MIk混合算法的分词准确率最高.

著录项

来源
《中南民族大学学报（自然科学版）》|2017年第4期|100-105|共6页
作者
覃俊; 林叶川; 易云飞;
展开▼
作者单位

中南民族大学计算机科学学院,武汉430074;

中南民族大学计算机科学学院,武汉430074;

河池学院计算机与信息工程学院,宜州546300;

展开▼
原文格式 PDF
正文语种 chi
中图分类信息处理（信息加工）;
关键词
壮文分词; MI改进算法; t-测试差; 混合算法; 语义词;

相似文献

中文文献
外文文献
专利

1. 基于互信息改进算法的新词发现对中文分词系统改进 [J] . 夏同飞 ,李志 ,王超 . 电子元器件与信息技术 . 2018,第009期
2. 基于互信息改进算法的新词发现对中文分词系统改进 [J] . 夏同飞 ,李志 ,王超 . 电子元器件与信息技术 . 2018,第009期
3. 基于改进的正向最大匹配中文分词算法研究 [J] . 王惠仙 ,龙华 . 贵州大学学报（自然科学版） . 2011,第005期
4. 基于最大匹配的中文分词改进算法研究 [J] . 赵源 . 科技信息 . 2010,第035期
5. 一种改进的基于Hash的中文分词算法研究 [J] . 蔡蕊 . 福建电脑 . 2010,第002期
6. 一种改进的基于海量智能分词的中文自动分词算法 [C] . 赵琳瑛 ,赵捧未 . 第五届中国管理科学与工程论坛 . 2007
7. 基于改进模糊测试的Web漏洞挖掘算法研究 [A] . 陆紫光 . 2018

基于互信息改进算法和t-测试差的壮文分词算法研究

摘要

著录项

相似文献

相关主题

期刊订阅