首页> 中文期刊> 《计算机工程与应用》 >基于层次聚类识别数据集前n个全局孤立点

基于层次聚类识别数据集前n个全局孤立点

     

摘要

In order to differentiate from positive and negative expansion terms related to original query and enhance query expansion-performance, a novel query expansion algorithm of local feedback is proposed based on association rules q→ ti and q → -tj , whichapplies positive and negative association rules mining technique to query expansion. Those positive and negative association rules q→tt and q→ -tj only containing original query terms are automatically mined from the top-ranked retrieved documents to constructpositive and negative association rules database respectively. Positive and negative expansion terms related to original query are extracted from these databases to build positive and negative expansion terms database separately. The terms the same as negative expansion terms are removed from positive expansion terms database and the rest of the terms of the positive expansion terms database are combined with original query for query expansion. A new query expansion model and computing method for weights of expansion terms are presented, which make the weighted value of an expansion term more reasonable. The results of the experiment show that the algorithm proposed can not only detect those false expansion terms but also improve and enhance the information retrieval performance.%孤立数据的存在使数据挖掘结果不准确,甚至错误.现有的孤立点检测算法在通用性、有效性、用户友好性及处理高维大数据集的性能还不完善,为此,提出一种有效的全局孤立点检测方法,该方法进行凝聚层次聚类,根据聚类树和距离矩阵来可视化判断数据孤立程度,确定孤立点数目.从聚类树自顶向下,无监督地去除离群数据点.在多个数据集上的仿真实验结果表明,该方法能有效识别孤立程度最大的前n个全局孤立点,适用于不同形状的数据集,算法效率高,用户友好,且适用于大型高维数据集的孤立点检测.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号