首页> 美国卫生研究院文献>PLoS Computational Biology >Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing
【2h】

Multivariate classification of neuroimaging data with nested subclasses: Biased accuracy and implications for hypothesis testing

机译:具有嵌套子类的神经影像数据的多元分类:偏差的准确性及其对假设检验的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biological data sets are typically characterized by high dimensionality and low effect sizes. A powerful method for detecting systematic differences between experimental conditions in such multivariate data sets is multivariate pattern analysis (MVPA), particularly pattern classification. However, in virtually all applications, data from the classes that correspond to the conditions of interest are not homogeneous but contain subclasses. Such subclasses can for example arise from individual subjects that contribute multiple data points, or from correlations of items within classes. We show here that in multivariate data that have subclasses nested within its class structure, these subclasses introduce systematic information that improves classifiability beyond what is expected by the size of the class difference. We analytically prove that this subclass bias systematically inflates correct classification rates (CCRs) of linear classifiers depending on the number of subclasses as well as on the portion of variance induced by the subclasses. In simulations, we demonstrate that subclass bias is highest when between-class effect size is low and subclass variance high. This bias can be reduced by increasing the total number of subclasses. However, we can account for the subclass bias by using permutation tests that explicitly consider the subclass structure of the data. We illustrate our result in several experiments that recorded human EEG activity, demonstrating that parametric statistical tests as well as typical trial-wise permutation fail to determine significance of classification outcomes correctly.
机译:生物数据集通常以高维和低效应量为特征。用于检测此类多元数据集中实验条件之间系统差异的有效方法是多元模式分析(MVPA),尤其是模式分类。但是,在几乎所有应用程序中,来自与关注条件相对应的类的数据都不是同质的,而是包含子类。例如,此类子类可以源自贡献多个数据点的单个主题,也可以源自类内项目的相关性。我们在此处显示,在子类嵌套在其类结构中的多元数据中,这些子类引入了系统信息,这些信息可提高可分类性,超出类差异大小所期望的范围。我们通过分析证明,这种子类偏差会根据子类的数量以及子类引起的方差部分,系统地夸大线性分类器的正确分类率(CCR)。在模拟中,我们证明了当类间效应大小低且子类方差高时,子类偏差最高。可以通过增加子类的总数来减少这种偏见。但是,我们可以使用显式考虑数据子类结构的置换测试来解决子类偏差。我们在记录人类脑电活动的几个实验中说明了我们的结果,证明了参数统计测试以及典型的按试验排列无法正确确定分类结果的重要性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号