【24h】

So you think you can PLS-DA?

机译:所以您认为可以PLS-DA吗?

获取原文

摘要

Partial Least-Squares Discriminant Analysis (PLS-DA) is a popular machine learning tool that is gaining increasing attention as a useful feature selector and classifier. In an effort to understand its strengths and weaknesses, we performed a series of experiments with synthetic data and compared its performance to its close relative from which it was initially invented, namely Principal Component Analysis (PCA). We demonstrate that even though PCA ignores the information regarding the class labels of the samples, this unsupervised tool can be remarkably effective as a feature selector. In some cases, it outperforms PLS-DA, which is made aware of the class labels in its input.Our experiments range from looking at the signal-to-noise ratio in the feature selection task, to considering many practical distributions and models encountered when analyzing bioinformatics and clinical data. Other methods were also evaluated.Our work sheds light on the kind of relationships and data models with which PLS-DA can be effective both as a feature selector as well as a classifier. In particular, we claim that when classes are determined by linear or non-linear relationships, PLS-DA provides almost no insight into the data. But it is effective when the classes have a clustered distribution on the signal features, even when these features are hidden among a large number of noise. PLS-DA retains a strong performance even when the classes are contained in n-orthotopes (i.e., rectangular boxes in the subspace of the signal features).Finally, we analyzed an interesting data set from 396 vaginal microbiome samples where the ground truth for the feature selection was available. Again, the results highlighted the strengths and weaknesses of PLS-DA in comparison with PCA.
机译:偏最小二乘判别分析(PLS-DA)是一种流行的机器学习工具,作为一种有用的特征选择器和分类器,它正受到越来越多的关注。为了了解其优势和劣势,我们使用合成数据进行了一系列实验,并将其性能与最初发明它的近亲(即主成分分析(PCA))进行了比较。我们证明,即使PCA忽略了有关样本的类标签的信息,但这种不受监督的工具作为特征选择器也可以非常有效。在某些情况下,它的性能要优于PLS-DA,后者可以识别输入中的类别标签。我们的实验范围从查看特征选择任务中的信噪比到考虑许多实际分布和模型时遇到的情况。分析生物信息学和临床数据。还评估了其他方法。我们的工作阐明了PLS-DA既可以用作特征选择器又可以用作分类器的关系和数据模型。特别是,我们声称,当通过线性或非线性关系确定类别时,PLS-DA几乎无法提供对数据的了解。但是,当这些类在信号特征上具有群集分布时,即使这些特征隐藏在大量噪声中也很有效。即使这些类别包含在n个正交位(即信号特征子空间中的矩形框)中,PLS-DA仍可保持强大的性能。最后,我们分析了396个阴道微生物组样本的有趣数据集,其中功能选择可用。再次,结果强调了与PCA相比,PLS-DA的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号