基于特征权重量化的相似度计算方法

刘铭; 吴冲; 刘远超; 孙承杰

首页> 中文期刊>计算机学报 >基于特征权重量化的相似度计算方法

基于特征权重量化的相似度计算方法

开具论文收录证明 >>

期刊封面封底目录下载 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

随着信息产业的迅猛发展，聚类的无监督特性使其成为一种极为有效的分析工具。而为获得良好的聚类结果，有效及准确的相似度计算方法是其必备的前提条件。事实上，在描述数据相似度时，不同的特征显然具有不同的作用，因此有必要借助一些先验知识，例如用户提供的限制数据，来衡量特征的重要性，并将其应用于相似度计算中以获取更加准确的计算结果。传统的特征权值量化方法均忽视了两点问题：（1）限制数据在特征空间中极有可能为非均匀分布；（2）限制数据可能包含不一致性。上述问题的存在使得传统的权值量化方法无法获得准确的结果甚至无法运行。基于此，文中提出了一种新颖的特征权值量化方法用以处理上述两点问题：（1）将限制数据划分为若干个等价类，进而通过计算参数“分布系数”来均匀化数据的分布；（2）将限制数据连接为无向图，进而通过计算参数“置信度”来衡量及弱化限制数据的不一致性。之后将这两个参数结合到特征权值量化函数中以获得准确的相似度计算结果。实验结果显示：该特征权值量化方法能够结合限制数据来获取不同特征对相似度计算的贡献能力，并能应用于任何聚类算法中以提高聚类的准确度。%Along with high-speed advance of information technology,the unsupervised character-istic of clustering makes itself an effective implement for data analysis.To acquire high clustering performance,the effective and precise similarity calculation plays a prime and necessary role for clustering algorithms.Owing to the fact that different features have diverse contributions to describe similarity among data,it is necessary to assess feature’s contribution by means of some transcendental knowledge (e.g.constrained data provided by users),and import it in similarity measurement to acquire more precise calculating results.Unfortunately,conventional weight evaluating methods all fail to consider two challenges:(1)high possibility of asymmetrical distri-bution of constrained data in feature space;(2 )high possibility of inconsistency contained by constrained data.Previous two issues disable conventional weight evaluating methods to acquire high precision,and even make them unable to work.Hence,this paper proposes a novel constraint based weight evaluating method to deal with them.For the former one,constrained data are partitioned into several equivalent classes,and distributing parameters are assigned to them to balance their distributions.For the latter one,constrained data are connected to form an undirected graph,and belief values are thereby computed to measure and reduce their possibilities to be inconsistent. Finally,these two parameters are integrated in weight evaluating function to form an accurate similarity measurement.Experimental results demonstrate that,this weight evaluating method can combine constrained data to obtain diverse contributions of different features to similarity calculation,and can be applied in any clustering algorithm to improve its precision.

著录项

来源
《计算机学报》|2015年第7期|1420-1433|共14页
作者
刘铭; 吴冲; 刘远超; 孙承杰;
展开▼
作者单位

哈尔滨工业大学管理学院哈尔滨 150001;

哈尔滨工业大学计算机科学与技术学院哈尔滨 150001;

哈尔滨工业大学管理学院哈尔滨 150001;

哈尔滨工业大学计算机科学与技术学院哈尔滨 150001;

哈尔滨工业大学计算机科学与技术学院哈尔滨 150001;

展开▼
原文格式 PDF
正文语种 chi
中图分类人工智能理论;
关键词
限制数据; 特征权重量化; 分布系数; 置信度;

相似文献

中文文献
外文文献
专利

1. 考虑物品相似权重的用户相似度计算方法 [J] . 罗军 ,朱文奇 . 计算机工程与应用 . 2015,第008期
2. 基于短文本相似度计算的工序卡片相似度计算方法 [J] . 童伟 ,王淑营 . 黑龙江科技信息 . 2021,第017期
3. 基于短文本相似度计算的工序卡片相似度计算方法 [J] . 童伟 ,王淑营 . 科学技术创新 . 2021,第017期
4. 基于限界传递相似度图的FCA概念相似度计算方法 [J] . 黄宏涛 ,吴忠良 ,万庆生 . 计算机科学 . 2015,第001期
5. 基于属性相似度在概念格的概念相似度计算方法 [J] . 裴梧延 ,张琳 . 现代计算机（普及版） . 2015,第006期
6. 基于概念权重的本体相似度计算 [C] . 王连诚 ,马强 . 全国第18届计算机技术与应用学术会议(CACIS) . 2007
7. 基于特征项权重与句子相似度的知识元智能提取技术研究 [A] . 唐静华 . 2017

基于特征权重量化的相似度计算方法

摘要

著录项

相似文献

相关主题

期刊订阅