介绍一种基于词结合提取的未登录词识别方法.该方法对碎片分词后的文本建立二元模型,结合互信息和规则过滤提取由若干个词组合而成的未登录词(组).测试结果准确率为84.71%,召回率为72.13%% This paper introduces a method to extract unknown Chinese words based on compound words recogni⁃tion. This method builds a bi-gram model on the text which is processed by fragments segmentation, and it uses mutual information and regulations to combine some adjacent words to unknown words. The precision on the open test sets is 84.71% and recall is 72.13%.
展开▼