首页> 美国卫生研究院文献>Springer Open Choice >Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
【2h】

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

机译:特征选择方法在自动聚类分析中的应用:综述综合数据集

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.
机译:具有数百至数千个特征的高维数据的有效建模在机器学习领域仍然是一项艰巨的任务。此过程是一项人工密集型任务,需要熟练的数据科学家在预处理数据集中应用探索性数据分析技术和统计方法,以使用机器学习方法进行有意义的分析。但是,数据的大量增长带来了对全自动数据分析方法的需求。关键挑战之一是如何准确选择一组相关特征,这些特征可以与不相关的噪声特征一起掩埋在高维数据中,方法是选择一组完整的输入特征子集,以较高的精度预测输出结果完整输入集的性能。 Kohonen的自组织神经网络图已通过多种方式用于此任务,例如加权自组织图(WSOM)方法,并且对该方法的有效性进行了综述。研究表明,由于不恰当地使用最速下降优化方法来最小化加权SOM的成本函数,因此WSOM方法在给定数据集的不同运行上可能会产生不同的结果。提出了一种基于训练后SOM分析的替代特征加权方法;所提出的方法允许SOM在分析输入相关性之前收敛,而WSOM旨在在训练过程中对输入应用加权会扭曲SOM的成本函数,从而导致多个局部最小值,这意味着SOM不能始终一致地收敛到同一点。州。我们通过改进的聚类分析证明了所提出的方法在特征选择方面优于WSOM和标准SOM的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号