Chinese word segmentation is a very important task in information processing.The present Chinese word segmentation technology mainly relies on common-word dictionary.But the dictionary has no recognition capability for unknown words.The authors brought forth a method of using double-dictionary to recognize unknown words.The process is to build a common-word dictionary and a single-word dictionary,then combine them for segmentation,solving the inefficiency in recognizing unknown words.As a result,the accuracy rate can reach above 90%.%针对目前中文分词技术主要依赖于常用词词典,而词典对未登录词识别率较低的问题,提出一种用双词典识别未登录词的方法,即构建一个常用词词典和一个单字词词典,二者相互结合进行分词,有效解决了对未登录词识别效率偏低的问题。实验表明,采用构建单字词表法对未登录词的识别准确率可达90%以上。
展开▼