首页> 中文期刊>计算机工程 >基于词共现有向图的中文合成词提取算法

基于词共现有向图的中文合成词提取算法

     

摘要

分词系统由于未将合成词收录进词典,因此不能识别合成词.针对该问题,提出一种基于词共现有向图的中文合成词提取算法.采用词性探测方法从文本中获取词串,由所获词串生成词共现有向图,并借鉴Bellman-Ford算法思想,从词共现有向图中搜索多源点长度最长且权重值满足给定条件的路径,该路径所对应的词串即为合成词.实验结果显示,该算法的合成词提取正确率达到91.16%.%Word segmentation systems do not include compound words into their dictionaries, so they can not recognize compound words. To address this problem, this paper proposes a Chinese compound word extraction algorithm based on word co-occurrence graph. It gets word strings from a document through by part-of-speech detecting, generates word co-occurrence directed graph,, and borrows the idea of the Bellman-Ford algorithm to search the longest paths with weight values satisfy the given conditions for multiple starting points in the word co-occurrence directed graph. The word strings corresponding to the paths are considered as compound words. Experimental results show that the algorithm achieves 91.16% upon the precision.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号