首页> 中文期刊>计算机工程 >基于Lucene的中文分析器分词性能比较研究

基于Lucene的中文分析器分词性能比较研究

     

摘要

The segmentation performance on Chinese analyzer of Lucene is insufficient, and the third party analyzer is difficult to choose. Because of this problem, this paper introduces several kinds of support Lucene analyzer, based on the experiment, sentence segmentation, word segmentation speed, index space and time, retrieval results and speed of retrieval are compared and researched. Analysis results show that, in Lucene framework, Paoding analyzer based on dictionary segmentation has the best overall performance, one-word analyzer of Lucene has the highest segmentation speed, imdict and ICTCLAS4J analyzer have greater room for improvement on the algorithm efficiency.%针对Lucene自带的中文分析器分词性能不理想并且难以选择第三方分析器的问题,研究多种基于Lucene的中文分析器,对语句分词、分词速度、建立索引的空间与时间、检索结果以及检索速度等方面进行比较.分析结果表明,在Lucene框架下,基于词典分词的Paoding分析器总体性能最优,Lucene自带的一元分析器分词速度最快,imdict与ICTCLAS4J分析器在算法效率上存在一定改进空间.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号