首页> 外文会议>ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10 >Direct Mining of Discriminative Patterns for Classifying Uncertain Data
【24h】

Direct Mining of Discriminative Patterns for Classifying Uncertain Data

机译:直接挖掘区分数据的判别模式

获取原文

摘要

Classification is one of the most essential tasks in data mining. Unlike other methods, associative classification tries to find all the frequent patterns existing in the input categorical data satisfying a user-specified minimum support and/or other discrimination measures like minimum confidence or information-gain. Those patterns are used later either as rules for rule-based classifier or training features for support vector machine (SVM) classifier, after a feature selection procedure which usually tries to cover as many as the input instances with the most discriminative patterns in various manners. Several algorithms have also been proposed to mine the most discriminative patterns directly without costly feature selection. Previous empirical results show that associative classification could provide better classification accuracy over many datasets.Recently, many studies have been conducted on uncertain data, where fields of uncertain attributes no longer have certain values. Instead probability distribution functions are adopted to represent the possible values and their corresponding probabilities. The uncertainty is usually caused by noise, measurement limits, or other possible factors. Several algorithms have been proposed to solve the classification problem on uncertain data recently, for example by extending traditional rule-based classifier and decision tree to work on uncertain data. In this paper, we propose a novel algorithm uHARMONY which mines discriminative patterns directly and effectively from uncertain data as classification features/rules, to help train either SVM or rule-based classifier. Since patterns are discovered directly from the input database, feature selection usually taking a great amount of time could be avoided completely. Effective method for computation of expected confidence of the mined patterns used as the measurement of discrimination is also proposed. Empirical results show that using SVM classifier our algorithm uHARMONY outperforms the state-of-the-art uncertain data classification algorithms significantly with 4% to 10% improvements on average in accuracy on 30 categorical datasets under varying uncertain degree and uncertain attribute number.
机译:分类是数据挖掘中最重要的任务之一。与其他方法不同,关联分类试图找到输入分类数据中存在的所有频繁模式,这些模式满足用户指定的最小支持和/或其他歧视性措施,例如最小置信度或信息获取。在特征选择过程(通常尝试以各种方式用最具区别性的模式覆盖尽可能多的输入实例)之后,这些模式稍后将用作基于规则的分类器的规则或支持向量机(SVM)分类器的训练功能。还提出了几种算法来直接挖掘最具区别性的模式,而无需进行昂贵的特征选择。先前的经验结果表明,关联分类可以在许多数据集上提供更好的分类准确性。 最近,对不确定数据进行了许多研究,其中不确定属性的字段不再具有某些值。取而代之的是,采用概率分布函数来表示可能的值及其对应的概率。不确定性通常是由噪声,测量极限或其他可能的因素引起的。最近提出了几种算法来解决不确定数据的分类问题,例如通过扩展传统的基于规则的分类器和决策树来处理不确定数据。在本文中,我们提出了一种新的算法uHARMONY,该算法直接和有效地从不确定数据中挖掘判别模式作为分类特征/规则,以帮助训练SVM或基于规则的分类器。由于模式是直接从输入数据库中发现的,因此可以完全避免通常花费大量时间的特征选择。还提出了一种有效的方法来计算挖掘模式的预期置信度,以作为区分度的度量。实证结果表明,使用SVM分类器,我们的算法uHARMONY明显优于最新的不确定数据分类算法,在不确定程度和不确定属性数变化的情况下,30个分类数据集的平均准确度平均提高了4%至10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号