首页> 外文会议>International Forum on Information Technology and Applications >Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval
【24h】

Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval

机译:中文信息检索中的比较概率潜在语义索引模型

获取原文

摘要

With the increasing of information on Internet, Web mining has been the focus of information retrieval. By a certain metric of similarity, Web clustering groups the similar Web documents. But the classical algorithms of clustering are aimless in searching the solution space and absent of semantic characters. In this paper, the probabilistic latent semantic indexing (PLSI) models which using word segmentation, two-grams and key words extraction separately are compared. As comparison, vector models using different Chinese information retrieval technologies are also tested in the same time. The experimental results show that the correct word segmentation can improve precision of information retrieval obviously to PLSI model. But it isn't effective to vector space model. And index based on key words extraction obtains highest accuracy rate to PLSI model.
机译:随着互联网信息的增加,网络挖掘一直是信息检索的焦点。通过相似性的某个度量,Web群集组类似的Web文档。但是群集的古典算法在寻找解决方案空间并且缺乏语义角色方面是漫无目的的。在本文中,比较了使用单词分割,两克和关键词分别提取的概率潜在语义索引(PLSI)模型。与比较一样,使用不同的中文信息检索技术的矢量模型也同时测试。实验结果表明,正确的词分割可以提高信息检索的精度,显然是PLSI模型。但它对矢量空间模型没有有效。基于关键词提取的索引获得了PLSI模型的最高精度率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号