Direct Mining of Discriminative Patterns for Classifying Uncertain Data

机译：直接挖掘区分数据的判别模式

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classification is one of the most essential tasks in data mining. Unlike other methods, associative classification tries to find all the frequent patterns existing in the input categorical data satisfying a user-specified minimum support and/or other discrimination measures like minimum confidence or information-gain. Those patterns are used later either as rules for rule-based classifier or training features for support vector machine (SVM) classifier, after a feature selection procedure which usually tries to cover as many as the input instances with the most discriminative patterns in various manners. Several algorithms have also been proposed to mine the most discriminative patterns directly without costly feature selection. Previous empirical results show that associative classification could provide better classification accuracy over many datasets.Recently, many studies have been conducted on uncertain data, where fields of uncertain attributes no longer have certain values. Instead probability distribution functions are adopted to represent the possible values and their corresponding probabilities. The uncertainty is usually caused by noise, measurement limits, or other possible factors. Several algorithms have been proposed to solve the classification problem on uncertain data recently, for example by extending traditional rule-based classifier and decision tree to work on uncertain data. In this paper, we propose a novel algorithm uHARMONY which mines discriminative patterns directly and effectively from uncertain data as classification features/rules, to help train either SVM or rule-based classifier. Since patterns are discovered directly from the input database, feature selection usually taking a great amount of time could be avoided completely. Effective method for computation of expected confidence of the mined patterns used as the measurement of discrimination is also proposed. Empirical results show that using SVM classifier our algorithm uHARMONY outperforms the state-of-the-art uncertain data classification algorithms significantly with 4% to 10% improvements on average in accuracy on 30 categorical datasets under varying uncertain degree and uncertain attribute number.

机译：分类是数据挖掘中最重要的任务之一。与其他方法不同，关联分类试图找到输入分类数据中存在的所有频繁模式，这些模式满足用户指定的最小支持和/或其他歧视性措施，例如最小置信度或信息获取。在特征选择过程（通常尝试以各种方式用最具区别性的模式覆盖尽可能多的输入实例）之后，这些模式稍后将用作基于规则的分类器的规则或支持向量机（SVM）分类器的训练功能。还提出了几种算法来直接挖掘最具区别性的模式，而无需进行昂贵的特征选择。先前的经验结果表明，关联分类可以在许多数据集上提供更好的分类准确性。最近，对不确定数据进行了许多研究，其中不确定属性的字段不再具有某些值。取而代之的是，采用概率分布函数来表示可能的值及其对应的概率。不确定性通常是由噪声，测量极限或其他可能的因素引起的。最近提出了几种算法来解决不确定数据的分类问题，例如通过扩展传统的基于规则的分类器和决策树来处理不确定数据。在本文中，我们提出了一种新的算法uHARMONY，该算法直接和有效地从不确定数据中挖掘判别模式作为分类特征/规则，以帮助训练SVM或基于规则的分类器。由于模式是直接从输入数据库中发现的，因此可以完全避免通常花费大量时间的特征选择。还提出了一种有效的方法来计算挖掘模式的预期置信度，以作为区分度的度量。实证结果表明，使用SVM分类器，我们的算法uHARMONY明显优于最新的不确定数据分类算法，在不确定程度和不确定属性数变化的情况下，30个分类数据集的平均准确度平均提高了4％至10％。

著录项

来源
《ACM SIGKDD international conference on knowledge discovery and data mining;KDD 10》|2011年|p.861-870|共10页
会议地点
作者
Chuancong Gao; Jianyong Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词
Associative Classification; Uncertain Data; Frequent Pattern Mining; Expected Confidence;

机译：关联分类;不确定的数据;频繁模式挖掘;预期信心;

相似文献

外文文献
中文文献
专利

1. Mining Discriminative Patterns for Classifying Trajectories on Road Networks [J] . Lee Jae-Gil, Han Jiawei, Li Xiaolei, Knowledge and Data Engineering, IEEE Transactions on . 2011,第5期

机译：挖掘判别模式对路网轨迹进行分类
2. Bridging Causal Relevance and Pattern Discriminability: Mining Emerging Patterns from High-Dimensional Data [J] . Yu Kui, Ding Wei, Wang Hao, IEEE Transactions on Knowledge and Data Engineering . 2013,第12期

机译：桥接因果相关性和模式可辨性：从高维数据中挖掘新兴模式
3. A new evolutionary algorithm for mining top-k discriminative patterns in high dimensional data [J] . Lucas Tarcisio, Silva Tulio C. P. B., Vimieiro Renato, Applied Soft Computing . 2017,第期

机译：一种新的高维数据中挖掘Top-K鉴别模式的新进化算法
4. Direct Mining of Discriminative Patterns for Classifying Uncertain Data [C] . Chuancong Gao, Jianyong Wang ACM SIGKDD international conference on knowledge discovery and data mining . 2010

机译：用于分类不确定数据的判别模式的直接挖掘
5. Mining Frequent Patterns from Uncertain Data with MapReduce [D] . Hayduk, Yaroslav 2012

机译：使用MapReduce从不确定的数据中挖掘频繁模式
6. Mining of high utility-probability sequential patterns from uncertain databases [O] . Binbin Zhang, Jerry Chun-Wei Lin, Philippe Fournier-Viger, 2011

机译：从不确定的数据库中挖掘高实用概率顺序模式
7. Direct mining of discriminative patterns for classifying uncertain data [O] . Chuancong Gao, Jianyong Wang 2010

机译：直接挖掘判别模式对不确定数据进行分类
8. Pattern-Directed Attention in Uncertain Frequency Detection [R] . Howard, J. H., O'Toole, A. J., Parasuraman, R., 1983

机译：不确定频率检测中的模式导向注意

Direct Mining of Discriminative Patterns for Classifying Uncertain Data

摘要

著录项

相似文献

相关主题

期刊订阅