首页> 中文期刊> 《太原科技大学学报》 >基于属性熵和加权余弦相似度的离群算法

基于属性熵和加权余弦相似度的离群算法

         

摘要

离群点检测是数据挖掘的一个重要研究方向,大多数离群数据挖掘算法在应用到高维数据集时效率较低。给出了一种基于属性熵和加权余弦相似度的离群数据挖掘算法 LEAWCD.该算法首先根据局部属性熵分析每个对象在其 k-邻域内的局部离群属性,并依据各离群属性的属性偏离度自动设置属性权向量;其次使用对高维数据有效的余弦相似度经加权后度量各对象在 k-邻域内的离群程度,实现高维局部离群点检测;最后采用国家天文台提供的天体光谱数据作为数据集,实验验证了 LEAWCD算法具有伸缩性强和检测精度高等优点。%outlier mining is An importAnt brAnch of dAtA mining field. At present,most of the outlier mining Algo-rithms with high-dimensionAl dAtA Are low efficient. An outlier mining Algorithm bAsed on Attribute entropy And weighted cosine similArity by the nAme of LEAWCD,is proposed in this pAper. Firstly,the outlier Attributes of eAch object in its k-neighborhood Are determined by AnAlyzing locAl Attribute entropy. Secondly,Attribute weight vector is set AutomAticAlly on the bAsis of deviAtion degree of outlier Attributes. Then the weighted cosine similArity,which is effective for high-dimensionAl dAtA,is used to meAsure eAch object's outlier degree. Thus the locAl outliers Are mined in high-dimensionAl dAtA. FinAlly,the experiments show thAt LEAWCD hAs strong scAlAbility And high preci-sion by using the celestiAl spectrum dAtA provided by the NAtionAl AstronomicAl observAtory As experimentAl dAtA.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号