首页> 外文会议>International Conference on Computational Statistics >Multiple Nested Reductions of Single Data Modes as a Tool to Deal with Large Data Sets
【24h】

Multiple Nested Reductions of Single Data Modes as a Tool to Deal with Large Data Sets

机译:单个数据模式的多次嵌套缩短作为处理大数据集的工具

获取原文

摘要

The increased accessibility and concerted use of novel measurement technologies give rise to a data tsunami with matrices that comprise both a high number of variables and a high number of objects. As an example, one may think of transcriptomics data pertaining to the expression of a large number of genes in a large number of samples or tissues (as included in various compendia). The analysis of such data typically implies ill-conditioned optimization problems, as well as major challenges on both a computational and an interpretational level. In the present paper, we develop a generic method to deal with these problems. This method was originally briefly proposed by Van Mechelen and Schepers (2007). It implies that single data modes (i.e., the set of objects or the set of variables under study) are subjected to multiple (discrete and/or dimensional) nested reductions. We first formally introduce the generic multiple nested reductions method. Next, we show how a few recently proposed modeling approaches fit within the framework of this method. Subsequently, we briefly introduce a novel instantiation of the generic method, which simultaneously includes a two-mode partitioning of the objects and variables under study (Van Mechelen et al. (2004)) and a low-dimensional, principal component-type dimensional reduction of the two-mode cluster centroids. We illustrate this novel instantiation with an application on transcriptomics data for normal and tumourous colon tissues. In the discussion, we highlight multiple nested mode reductions as a key feature of the novel method. Furthermore, we contrast the novel method with other approaches that imply different reductions for different modes, and approaches that imply a hybrid dimensional/discrete reduction of a single mode. Finally, we show in which way the multiple reductions method allows a researcher to deal with the challenges implied by the analyis of large data sets as outlined above.
机译:新的测量技术的增加的可访问性和协同用途产生了一种具有矩阵的数据海啸,包括大量变量和大量对象。作为一个例子,可以考虑与大量样品或组织中大量基因表达的转录组族数据(如在各种Comendia中包含)。对这些数据的分析通常意味着不良的优化问题,以及对计算和解释层面的主要挑战。在本文中,我们开发了一种处理这些问题的通用方法。此方法最初由Van Mechelen和Schepers简要提出(2007)。它意味着单个数据模式(即,对象集或研究中的变量集)受到多个(离散和/或尺寸)嵌套的折断。我们首先正式介绍了通用的多嵌套缩减方法。接下来,我们展示了最近提出的若干建模方法如何适应该方法的框架内。随后,我们简要介绍了通用方法的新型实例化,其同时包括在研究下的物体和变量的两模分隔(Van Mechelen等,(2004))和低维,主成分型尺寸减少两模群体质心。我们说明了这种新的实例化,其在正常和毒性结肠组织的转录组织数据上应用。在讨论中,我们将突出显示多个嵌套模式缩减作为新方法的关键特征。此外,我们将新方法与其他方法对比不同模式的不同模式的方法,以及暗示单个模式的混合尺寸/离散减少的方法。最后,我们表明,多种缩短方法允许研究人员处理如上所述的大数据集的分析所暗示的挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号