首页> 外文学位 >New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data.

【24h】

New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data.

机译：用于学习大数据潜在结构的可伸缩统计方法的新进展。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Constant technology advances have caused data explosion in recent years. Accordingly modern statistical and machine learning methods must be adapted to deal with complex and heterogeneous data types. This phenomenon is particularly true for an- alyzing biological data. For example DNA sequence data can be viewed as categorical variables with each nucleotide taking four different categories. The gene expression data, depending on the quantitative technology, could be continuous numbers or counts. With the advancement of high-throughput technology, the abundance of such data becomes unprecedentedly rich. Therefore efficient statistical approaches are crucial in this big data era.;Previous statistical methods for big data often aim to find low dimensional structures in the observed data. For example in a factor analysis model a latent Gaussian distributed multivariate vector is assumed. With this assumption a factor model produces a low rank estimation of the covariance of the observed variables. Another example is the latent Dirichlet allocation model for documents. The mixture proportions of topics, represented by a Dirichlet distributed variable, is assumed. This dissertation proposes several novel extensions to the previous statistical methods that are developed to address challenges in big data. Those novel methods are applied in multiple real world applications including construction of condition specific gene co-expression networks, estimating shared topics among newsgroups, analysis of pro- moter sequences, analysis of political-economics risk data and estimating population structure from genotype data.

机译：不断的技术进步导致近年来的数据爆炸。因此，现代统计和机器学习方法必须适应于处理复杂和异构的数据类型。这种现象在分析生物学数据时尤其如此。例如，DNA序列数据可以被视为分类变量，每个核苷酸具有四个不同的类别。基因表达数据取决于定量技术，可以是连续的数字或计数。随着高通量技术的发展，此类数据的丰富性变得空前丰富。因此，有效的统计方法在这个大数据时代至关重要。;以前的大数据统计方法通常旨在在观察到的数据中找到低维结构。例如，在因子分析模型中，假定了一个潜在的高斯分布多元向量。在这种假设下，因子模型对观察到的变量的协方差产生低秩估计。另一个示例是潜在的Dirichlet文档分配模型。假定主题的混合比例由Dirichlet分布变量表示。本文提出了对以前的统计方法的一些新颖的扩展，以应对大数据中的挑战。这些新颖的方法被应用于多种实际应用中，包括构建条件特定的基因共表达网络，估计新闻组之间的共享主题，分析启动子序列，分析政治经济学风险数据以及根据基因型数据估算人口结构。

著录项

作者
Zhao, Shiwen.;
展开▼
作者单位

Duke University.;

展开▼
授予单位 Duke University.;
学科 Statistics.;Mathematics.;Bioinformatics.
学位 Ph.D.
年度 2016
页码 203 p.
总页数 203
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Latent structured perceptrons for large-scale learning with hidden information [J] . Chuanlei Zhang Computing reviews . 2014 ,第2期

机译：潜在的结构化感知器，可利用隐藏信息进行大规模学习
2. Latent Structured Perceptrons for Large-Scale Learning with Hidden Information [J] . Sun, Xu, Matsuzaki, IEEE Transactions on Knowledge and Data Engineering . 2013 ,第9期

机译：具有隐藏信息的大规模学习的潜在结构化感知器
3. Thinking and Methodology Statistical Identification of Syndromes Feature and Structure of Disease of Western Medicine Based on General Latent Structure Model~ [J] . 杨伟, 易丹辉, 谢雁呜, 中国结合医学杂志：英文版 . 2012 ,第011期

机译：基于通用潜在结构模型的西医证候特征和结构统计识别的思想方法论〜
4. An improved quality-related statistical process monitoring method based on global plus local projection to latent structures (GPLPLS) [C] . Jinglin Zhou, Shunli Zhang, Han Zhang, Chinese Automation Congress . 2017

机译：一种基于质量和潜在结构的全局投影的改进的与质量相关的统计过程监视方法（GPLPLS）
5. Statistical learning methods for aero-optic wavefront prediction and adaptive-optic latency compensation. [D] . Burns, W. Robert. 2016

机译：航空波阵面预测和自适应光学等待时间补偿的统计学习方法。
6. Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale [O] . Jnanendra Prasad Sarkar, Indrajit Saha, Adrian Lancucki, 2020

机译：在全基因组规模中使用统计学习方法鉴定MiRNA生物标志物的不同癌症类型
7. Latent structured perceptrons for large-scale learning with hidden information [O] . Sun X, Matsuzaki T, Li W 2013

机译：具有隐藏信息的大规模学习的潜在结构化感知器
8. Statistical Machine Learning for Structured and High Dimensional Data. [R] . Wasserman, L., Lafferty, J. 2014

机译：结构化和高维数据的统计机器学习。

New Advancements of Scalable Statistical Methods for Learning Latent Structures in Big Data.

摘要

著录项

相似文献

相关主题

期刊订阅