首页> 外文OA文献 >Analysis of High-dimensional and Left-censored Data with Applications in Lipidomics and Genomics
【2h】

Analysis of High-dimensional and Left-censored Data with Applications in Lipidomics and Genomics

机译:高维和左删失数据的分析及其在脂质组学和基因组学中的应用

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Recently, there has been an occurrence of new kinds of high- throughput measurement techniques enabling biological research to focus on fundamental building blocks of living organisms such as genes, proteins, and lipids. In sync with the new type of data that is referred to as the omics data, modern data analysis techniques have emerged. Much of such research is focusing on finding biomarkers for detection of abnormalities in the health status of a person as well as on learning unobservable network structures representing functional associations of biological regulatory systems. The omics data have certain specific qualities such as left-censored observations due to the limitations of the measurement instruments, missing data, non-normal observations and very large dimensionality, and the interest often lies in the connections between the large number of variables. There are two major aims in this thesis. First is to provide efficient methodology for dealing with various types of missing or censored omics data that can be used for visualisation and biomarker discovery based on, for example, regularised regression techniques. Maximum likelihood based covariance estimation method for data with censored values is developed and the algorithms are described in detail. Second major aim is to develop novel approaches for detecting interactions displaying functional associations from large-scale observations. For more complicated data connections, a technique based on partial least squares regression is investigated. The technique is applied for network construction as well as for differential network analyses both on multiple imputed censored data and next- generation sequencing count data.
机译:最近,出现了一种新型的高通量测量技术,使生物学研究能够专注于生物的基本组成部分,例如基因,蛋白质和脂质。与称为组学数据的新型数据同步,现代数据分析技术应运而生。这些研究的大部分集中在寻找用于检测人的健康状况异常的生物标记物上,以及学习表示生物调控系统功能关联的不可观察的网络结构。组学数据具有某些特定的质量,例如由于测量仪器的局限性而被左删节的观测,缺失的数据,非正态观测和非常大的维度,并且人们的兴趣通常在于大量变量之间的联系。本论文主要有两个目的。首先是提供一种有效的方法,用于处理各种类型的缺失或审查的组学数据,这些数据可用于基于正则化回归技术的可视化和生物标记发现。提出了一种基于最大似然的数据删失协方差估计方法,并对算法进行了详细描述。第二个主要目标是开发新的方法,以检测来自大规模观测的显示功能关联的相互作用。对于更复杂的数据连接,研究了一种基于偏最小二乘回归的技术。该技术既可用于网络构建,也可用于对多个估算的审查数据和下一代测序计数数据进行差分网络分析。

著录项

  • 作者

    Pesonen Maiju;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号