首页> 外文学位 >Integrated analysis of partial sampling techniques in bioinformatics.
【24h】

Integrated analysis of partial sampling techniques in bioinformatics.

机译:生物信息学中部分采样技术的综合分析。

获取原文
获取原文并翻译 | 示例

摘要

With the development of microarray and the more recent next-generation sequencing technologies, researchers in genomics have been able to conduct large-scale and high-throughput experiments on the DNA level in order to investigate the abundance of different gene transcripts in the cell, and also to identify structural variants in individual genomes. The biological data from such experiments are usually signal intensities or sequence contents of DNA fragments, which can be viewed as partially observed samples from a pool of complete objects (e.g. short DNA fragments from a mixture of full-length transcript sequences). What is more, these partial samples can be obtained via different technologies, each with its own characteristic error rate, sampling bias and per-sample cost. This thesis describes methods for integrated analysis of such samples in different problems, where computational frameworks and solutions are established to quantitatively parameterize statistical models and efficient algorithms are designed to estimate the variance of the method's accuracy. Both simulation and analytical methods are developed to find the optimal low-cost integration of different sampling techniques in each experiment design. The specific problems being considered include 1) systematically selecting unlabeled DNA regions for validation to train a predictive model, 2) integrated analysis of fragmented DNA sequences to estimate the distribution of full-length gene transcripts, and 3) conducting efficient simulations to model the local de novo assembly process in individual genome re-sequencing. A key aspect of some of the above problems is establishing fast algorithms to compute a corresponding Fisher information based measurement for performance estimation.
机译:随着微阵列的发展和更新的下一代测序技术的发展,基因组学研究人员已经能够在DNA水平上进行大规模和高通量实验,从而研究细胞中不同基因转录本的丰度,并且还可以鉴定单个基因组中的结构变异。来自此类实验的生物学数据通常是DNA片段的信号强度或序列内容,可以将其视为完整对象库中部分观察到的样本(例如,全长转录序列的混合物中的短DNA片段)。而且,这些部分样本可以通过不同的技术获得,每种技术都有其自己的特征误码率,抽样偏差和每样本成本。本文介绍了在不同问题中对此类样本进行综合分析的方法,其中建立了计算框架和解决方案以对统计模型进行定量参数化,并设计了有效的算法来估计该方法准确性的差异。开发了仿真和分析方法,以在每个实验设计中找到不同采样技术的最佳低成本集成。正在考虑的具体问题包括:1)系统地选择未标记的DNA区域进行验证以训练预测模型; 2)对片段化的DNA序列进行整合分析以估计全长基因转录本的分布; 3)进行有效的模拟以对本地模型进行建模个体基因组重新测序中的从头组装过程。上述问题中的一些关键方面是建立快速算法,以计算相应的基于Fisher信息的性能估计值。

著录项

  • 作者

    Du, Jiang.;

  • 作者单位

    Yale University.;

  • 授予单位 Yale University.;
  • 学科 Biology Bioinformatics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 135 p.
  • 总页数 135
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号