正向最大匹配分词FMM(Forward Maximum Matching)算法存在设定的最大词长初始值固定不变的问题,带来长词丢失或匹配次数较多的弊端.针对此问题提出了根据中文分词词典中的词条长度动态确定截取待处理文本长度的思想,改进了FMM算法.与此相配合,设计了一种词典结构,使之能够有效地支持改进的算法.改进的算法与一般正向最大匹配算法相比大大减少了匹配次数,分析表明中文分词的速度和效率有了很大提高.%There is a problem in forward maximum matching (FMM) algorithm that the initial value of the maximum word-length is immovable, this might lead to the longer words cannot be segmented correctly and be matched repeatedly.Aiming at this problem, this paper puts forward an idea for improving FMM algorithm that is to assign the maximum text-length to be treated dynamically based on the wordlength in Chinese word segmentation word bank.To fit this, in the paper we design a word bank structure to enable the effective support on the improvement of FMM.Compared with normal FMM, the improved FMM sharply reduces matching times.Analysis in this paper shows that the speed and efficiency of Chinese Word segmentation algorithm have been obviously improved.
展开▼