首页> 中文期刊> 《计算机应用与软件》 >一种改进的中文分词正向最大匹配算法

一种改进的中文分词正向最大匹配算法

     

摘要

正向最大匹配分词FMM(Forward Maximum Matching)算法存在设定的最大词长初始值固定不变的问题,带来长词丢失或匹配次数较多的弊端.针对此问题提出了根据中文分词词典中的词条长度动态确定截取待处理文本长度的思想,改进了FMM算法.与此相配合,设计了一种词典结构,使之能够有效地支持改进的算法.改进的算法与一般正向最大匹配算法相比大大减少了匹配次数,分析表明中文分词的速度和效率有了很大提高.%There is a problem in forward maximum matching (FMM) algorithm that the initial value of the maximum word-length is immovable, this might lead to the longer words cannot be segmented correctly and be matched repeatedly.Aiming at this problem, this paper puts forward an idea for improving FMM algorithm that is to assign the maximum text-length to be treated dynamically based on the wordlength in Chinese word segmentation word bank.To fit this, in the paper we design a word bank structure to enable the effective support on the improvement of FMM.Compared with normal FMM, the improved FMM sharply reduces matching times.Analysis in this paper shows that the speed and efficiency of Chinese Word segmentation algorithm have been obviously improved.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号