首页> 美国卫生研究院文献>Springer Open Choice >Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

【2h】

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

机译：特征选择方法在自动聚类分析中的应用：综述综合数据集

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.

机译：具有数百至数千个特征的高维数据的有效建模在机器学习领域仍然是一项艰巨的任务。此过程是一项人工密集型任务，需要熟练的数据科学家在预处理数据集中应用探索性数据分析技术和统计方法，以使用机器学习方法进行有意义的分析。但是，数据的大量增长带来了对全自动数据分析方法的需求。关键挑战之一是如何准确选择一组相关特征，这些特征可以与不相关的噪声特征一起掩埋在高维数据中，方法是选择一组完整的输入特征子集，以较高的精度预测输出结果完整输入集的性能。 Kohonen的自组织神经网络图已通过多种方式用于此任务，例如加权自组织图（WSOM）方法，并且对该方法的有效性进行了综述。研究表明，由于不恰当地使用最速下降优化方法来最小化加权SOM的成本函数，因此WSOM方法在给定数据集的不同运行上可能会产生不同的结果。提出了一种基于训练后SOM分析的替代特征加权方法；所提出的方法允许SOM在分析输入相关性之前收敛，而WSOM旨在在训练过程中对输入应用加权会扭曲SOM的成本函数，从而导致多个局部最小值，这意味着SOM不能始终一致地收敛到同一点。州。我们通过改进的聚类分析证明了所提出的方法在特征选择方面优于WSOM和标准SOM的优越性。

著录项

期刊名称 Springer Open Choice
作者
Aliyu Usman Ahmad; Andrew Starkey;
展开▼
作者单位

展开▼
年(卷),期 -1(29),7
年度 -1
页码 317–328
总页数 12
原文格式 PDF
正文语种
中图分类外科学;
关键词
Clustering Self-organising neural network map Feature selection Automation;

机译：聚类;自组织神经网络图;特征选择;自动化;

相似文献

外文文献
中文文献
专利

1. Application of feature selection methods for automated clustering analysis: a review on synthetic datasets [J] . Aliyu Usman Ahmad, Andrew Starkey Neural computing & applications . 2018,第7期

机译：自动聚类分析特征选择方法的应用 - 合成数据集综述
2. A review of microarray datasets and applied feature selection methods [J] . V. Bolón-Canedo, N. Sánchez-Maro?o, A. Alonso-Betanzos, Information Sciences: An International Journal . 2014,第Null期

机译：芯片数据集和应用特征选择方法综述
3. ParticleMDI: particle Monte Carlo methods for the cluster analysis of multiple datasets with applications to cancer subtype identification [J] . Advances in data analysis and classification . 2020,第2期

机译：polareLemdi：粒子蒙特卡罗对癌症亚型鉴定的多个数据集的聚类分析方法
4. Feature selection and Ensemble Hierarchical Cluster-based Under-sampling approach for extremely imbalanced datasets: Application to gene classification [C] . Soltani Sima, Sadri Javad, Torshizi Hassan Ahmadi International eConference on Computer and Knowledge Engineering;ICCKE . 2011

机译：极不平衡数据集的特征选择和基于集合层次聚类的欠采样方法：在基因分类中的应用
5. Feature selection methods for support vector machines for two or more classes, with applications to the analysis of Alzheimer's disease and its onset with MRI brain image processing. [D] . Aksu, Yaman. 2010

机译：支持向量机的特征选择方法分为两类或更多类，可用于阿尔茨海默氏病的分析及其在MRI脑图像处理中的发作。
6. Performance comparison of linear and non-linear feature selection methods for the analysis of large survey datasets [O] . Olga Krakovska, Gregory Christie, Andrew Sixsmith, -1

机译：线性和非线性特征选择方法在大型调查数据集分析中的性能比较
7. Application of feature selection methods for automated clustering analysis : a review on synthetic datasets [O] . Ahmad, Aliyu Usman, Starkey, Andrew 2017

机译：特征选择方法在自动聚类分析中的应用：综述综合数据集
8. Computerized Pattern Recognition Applications to Chemical Analysis. Development of Interactive Feature Selection Methods for the K-Nearest Neighbor Technique. [R] . pichler, marty a. perone,sam p. 1974

机译：计算机模式识别在化学分析中的应用。 K-最近邻技术交互特征选择方法的发展。

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅