首页> 外文会议>International Conference on Data Mining >Comparing dissimilarity measures for probabilistic symbolic objects
【24h】

Comparing dissimilarity measures for probabilistic symbolic objects

机译:比较概率象征物对象的不同措施

获取原文

摘要

Symbolic data analysis generalizes some standard statistical data mining methods, such as those developed for classification and clustering tasks, to the case of symbolic objects (SOs). These objects, informally defined as "aggregated data" because they synthesize information concerning a group of individuals of a population, ensure confidentiality of original data, nevertheless they pose new problems which finds a solution in symbolic data analysis. A by-product of working with aggregate data is the possibility of dealing with data from complex questionnaires, where multiple answers are possible or constraints among different answers exists. Comparing SOs is an important step of symbolic data analysis. It can be useful either to cluster some SOs or to discriminate between them, or even to order SOs according to their degree of generalization. This paper presents a comparative study aiming at evaluating the degree of dissimilarity between the objects of a restricted class of symbolic data, namely Probabilistic Symbolic Objects. To define a ground truth for the empirical evaluation, a data set with understandable and explainable properties has been selected. In the experiment, only two dissimilarity measures, among the seven ones we have studied, seems to have a more stable behaviour.
机译:符号数据分析概括了一些标准统计数据挖掘方法,例如开发用于分类和聚类任务的方法,以及符号对象(SOS)的情况。这些对象,非正式地定义为“聚合数据”,因为它们综合了有关一组人口的信息,请确保原始数据的机密性,尽管它们会在符号数据分析中找到解决方案的新问题。使用聚合数据的副产品是处理复杂问卷的数据的可能性,其中可能存在多个答案或不同答案之间的约束。比较SOS是符号数据分析的重要步骤。它可以在群体之间进行聚类或区分它们,甚至根据其泛化程度来歧视SO。本文提出了一个比较研究,旨在评估受限制数据的对象之间的异化程度,即概率符号对象。要定义实证评估的基础事实,已选择具有可理解性和可说明的属性的数据集。在实验中,只有两个不同的措施,其中七个我们研究过的措施似乎具有更稳定的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号