首页> 外文会议>Society of Photo-Optical Instrumentation Engineers Conference on Data Mining and Knowledge Discovery >Evaluation of similarity measures for analysis of databases on laboratory examinations
【24h】

Evaluation of similarity measures for analysis of databases on laboratory examinations

机译:对实验室考试分析数据库的相似性措施评价

获取原文

摘要

One of the key concepts in data mining is to give a suitable partition of datasets in an automatic way. On one hand, classification method is to find the partitions given by combinations of attribute-value pairs which are best fit to the partition given by target concepts. On the other hand, clustering method is to find the partitions which best characterise given datasets by using a similarity measure. Therefore, the choice of distance or similarity measures are one of the most important research topics in data mining. However, such empirical comparisons have never been studied in the literature. In this paper, several types of similarity measures were compared in the following three clinical contexts: the first one is for datasets composed of only categorical attributes. The second ne is for those of mixture of categorical and numerical attributes. The final one is for those of only numerical attributes. Experimental results show that simple similarity measures perform as well as new proposed measures.
机译:数据挖掘中的一个关键概念是以自动方式给出合适的数据集分区。一方面,分类方法是通过属性值对的组合找到由目标概念给出的分区的属性值对给出的分区。另一方面,聚类方法是通过使用相似度量找到最能表现给定数据集的分区。因此,距离或相似度措施的选择是数据挖掘中最重要的研究主题之一。然而,这些实证比较从未在文献中研究过。在本文中,在以下三种临床上下文中比较了几种类型的相似度测量:第一个是用于仅由分类属性组成的数据集。第二个NE是用于分类和数值的混合的那些。最终的是只有数值属性的那些。实验结果表明,简单的相似度措施和新的建议措施。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号