A comparative study of clustering and classification algorithms.

机译：聚类和分类算法的比较研究。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering and Classification are two of the most common data mining tasks, used frequently for data categorization and analysis in both industry and academia. Clustering is the process of organizing unlabeled objects into groups of which members are similar in some way. Clustering is a kind of unsupervised learning algorithm. It does not use category labels when grouping objects. In Semi-Supervised clustering, some prior knowledge is available either in the form of labeled data or pair-wise constraints on some of the objects. Classification is a kind of supervised learning algorithm. It is a procedure to assign class labels. A classifier is constructed from the labeled training data using certain classification algorithm, it then will be used to predict the class label of the test data.;In this dissertation, the results of a comprehensive comparative study of three kinds of clustering algorithms including Co-Clustering, Consensus-based Clustering and Semi-supervised Clustering is presented. Through experiments using artificial datasets with different data substructures and UCI data sets, the performance of the three kinds of clustering algorithms was compared and analyzed. A method was proposed to combine a Co-Clustering algorithm and a Semi-supervised Clustering algorithm. A comprehensive comparative study was conducted on three kinds of classification algorithms including Logistic Regression Classifier, Support Vector Machine and Decision Tree. Experiments were carried out using different artificial datasets and UCI data sets to analyze and compare their classification performance. A method using controlled False Discovery Rate was proposed in Logistic Regression Classifier to select important features. A detailed proof was developed to show that controlling False Discovery Rate can be achieved by controlling the related p-value. Experiments were also conducted to compare the classification performance using the proposed feature selection algorithm.;Keywords. Classification, Clustering, Semi-supervised Clustering, Feature Selection.

机译：聚类和分类是两个最常见的数据挖掘任务，在行业和学术界都经常用于数据分类和分析。聚类是将未标记对象组织成组的过程，这些组的成员在某种程度上相似。聚类是一种无监督的学习算法。在对对象进行分组时，它不使用类别标签。在半监督聚类中，可以以标记数据的形式或某些对象上的成对约束的形式获得一些先验知识。分类是一种监督学习算法。这是分配类标签的过程。利用一定的分类算法，从标记的训练数据中构造出一个分类器，然后将其用于预测测试数据的分类标签。本文对三种聚类算法（包括Co-提出了聚类，基于共识的聚类和半监督聚类。通过使用具有不同数据子结构和UCI数据集的人工数据集进行实验，比较和分析了三种聚类算法的性能。提出了一种将联合聚类算法和半监督聚类算法相结合的方法。对Logistic回归分类器，支持向量机和决策树这三种分类算法进行了全面的比较研究。使用不同的人工数据集和UCI数据集进行了实验，以分析和比较它们的分类性能。在Logistic回归分类器中提出了一种使用受控误发现率的方法来选择重要特征。开发了详细的证明以表明可以通过控制相关的p值来控制错误发现率。还进行了实验，以使用提出的特征选择算法比较分类性能。分类，聚类，半监督聚类，特征选择。

著录项

作者
Huang, Shuqing.;
展开▼
作者单位

Tulane University School of Science and Engineering.;

展开▼
授予单位 Tulane University School of Science and Engineering.;
学科 Computer Science.
学位 Ph.D.
年度 2007
页码 117 p.
总页数 117
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. [J] . Baha Sen, Musa Peker, Abdullah Cavu?o?lu, Journal of medical systems . 2014,第3期

机译：基于脑电信号特征选择和分类算法的睡眠阶段分类比较研究。
2. A Comparative Study Based on Rough Set and Classification Via Clustering Approaches to Handle Incomplete Data to Predict Learning Styles [J] . Hemant Rana International journal of decision support system technology . 2017,第2期

机译：基于粗糙集和分类的聚类方法处理不完整数据以预测学习风格的比较研究
3. Comparative study on projected clustering methods for hyperspectral imagery classification [J] . Mehta Anand, Dikshit Onkar Geocarto international . 2016,第3a4期

机译：投影聚类方法在高光谱图像分类中的比较研究
4. A Performance Comparative Analysis Between Rule-Induction Algorithms and Clustering-Based Constructive Rule-Induction Algorithms. Application to Rheumatoid Arthritis [C] . J.A. Sanandres-Ledesma, Victor Maojo, Jose Crespo, International Symposium on Biological and Medical Data Analysis(ISBMDA 2004); 20041118-19; Barcelona(ES) . 2004

机译：规则归纳算法与基于聚类的构造性规则归纳算法之间的性能比较分析。在类风湿关节炎中的应用
5. A comprehensive cluster validity framework for clustering algorithms. [D] . Azem, Zeyad M. 2003

机译：用于聚类算法的全面的聚类有效性框架。
6. Comparative Study of Hydrochemical Classification Based on Different Hierarchical Cluster Analysis Methods [O] . Jianwei Bu, Wei Liu, Zhao Pan, 2020

机译：基于不同分层聚类分析方法的水化学分类对比研究
7. Unsupervised Land-Use Classification of Multispectral Satellite Images. A Comparison of Conventional and Fuzzy-Logic Based Clustering Algorithms. [O] . Tanja Duda, Morton Canty, Dieter Klaus 2007

机译：多光谱卫星影像的无监督土地利用分类。基于常规和模糊逻辑的聚类算法比较。
8. Application of Cluster Analysis to Aerometric Data. Volume I. Part 1: Clustering, Validation, and Classification of Data. Part 2: Investigation and Report of Cluster Analysis [R] . Crutcher, H. L. , Nelson, C. , Fairbairn, B. , 1980

机译：聚类分析在航空数据中的应用。第一部分：数据的聚类，验证和分类。第2部分：聚类分析的调查和报告

A comparative study of clustering and classification algorithms.

摘要

著录项

相似文献

相关主题

期刊订阅