首页> 外文会议>SIAM International Conference on Data Mining >Similarity Measures for Categorical Data: A Comparative Evaluation
【24h】

Similarity Measures for Categorical Data: A Comparative Evaluation

机译:分类数据的相似性措施:比较评估

获取原文
获取外文期刊封面目录资料

摘要

Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of similarity for continuous data is relatively well-understood, but for categorical data, the similarity computation is not straightforward. Several data-driven similarity measures have been proposed in the literature to compute the similarity between two categorical data instances but their relative performance has not been evaluated. In this paper we study the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection. Results on a variety of data sets show that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.
机译:两个实体之间的测量相似度或距离是几个数据挖掘和知识发现任务的关键步骤。对于连续数据的相似性的概念相对良好地理解,但对于分类数据,相似性计算并不简单。在文献中提出了几种数据驱动的相似度测量,以计算两个分类数据实例之间的相似性,但尚未评估其相对性能。在本文中,我们在特定数据挖掘任务的上下文中研究了各种相似性测量的性能:异常检测。结果各种数据集显示,虽然没有一种措施占据所有类型的问题,但有些措施能够始终如一的高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号