首页> 中文期刊>河南农业大学学报 >融合Wikipedia分类结构及显式语义特征的短文本检索

融合Wikipedia分类结构及显式语义特征的短文本检索

     

摘要

Considering the short length, little information, sparse features and irregular grammar of the large number of short text data appeared in the Web information space, traditional information retrieval technology cannot deal with short text effectively. In view of the above problems, in this research the semantic relatedness is taken as the starting point. The short text retrieval technology based on the current mainstream semantic knowledge source Wikipedia is studied. According to the taxonomy information contained in Wikipedia pages,an explicit semantic feature selection and relatedness computation method are proposed. On this basis,a short text retrieval method under low dimensional explicit semantic space is proposed. Finally, the feasibility and effectiveness of the method are verified by experimental tests. The results showed that,compared with the graph-based and link-based methods, this research improves MAP by 6% and 4. 1% ,P@ 30 by 10. 4% and 5. 8% ,R-Prec by 6. 1% and 3%,respectively.%针对网络信息空间出现的大量短文本具有长度短、信息量少、特征稀疏、语法不规则等特点,传统信息检索技术无法有效地对其进行处理的问题,本研究以语义关联度为出发点,基于当前主流的语义知识源Wikipedia来研究短文本检索技术.根据Wikipedia页面中包含的分类结构信息,提出一种显式语义特征选择及关联度计算方法.在此基础上,提出一种低维显式语义空间下的短文本检索方法,并通过实验测试验证了该方法的可行性和有效性.研究结果表明,本研究与当前基于图论的方法和基于链接的方法相比,分别在评估指标MAP上提高了6%和4. 1%,在P@30上提高了10. 4%和5. 8%,在R-Prec上提高了6. 1%和3%.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号