首页> 外文学位 >From Pieces to Paths: Combining Disparate Information in Computational Analysis of RNA-Seq
【24h】

From Pieces to Paths: Combining Disparate Information in Computational Analysis of RNA-Seq

机译:从碎片到路径:RNA-Seq计算分析中的不同信息结合

获取原文
获取原文并翻译 | 示例

摘要

As high-throughput sequencing technology has advanced in recent decades, large-scale genomic data with high-resolution have been generated for solving various problems in many fields. One of the state-of-the-art sequencing techniques is RNA sequencing, which has been widely used to study the transcriptomes of biological systems through millions of reads. The ultimate goal of RNA sequencing bioinformatics algorithms is to maximally utilize the information stored in a large amount of pieced-together reads to unveil the whole landscape of biological function at the transcriptome level.;Many bioinformatics methods and pipelines have been developed for better achieving this goal. However, one central question of RNA sequencing is the prediction uncertainty due to the short read length and the low sampling rate of underexpressed transcripts. Both conditions raise ambiguities in read mapping, transcript assembly, transcript quantification, and even the downstream analysis.;This dissertation focuses on approaches to reducing the above uncertainty by incorporating additional information, of disparate kinds, into bioinformatics models and modeling assessments. I addressed three critical issues in RNA sequencing data analysis. (1) we evaluated the performance of current de novo assembly methods and their evaluation methods using the transcript information from a third generation sequencing platform, which provides a longer sequence length but with a higher error rate than next-generation sequencing; (2) we built a Bayesian graphical model for improving transcript quantification and differentially expressed isoform identification by utilizing the shared information from biological replicates; (3) we built a joint pathway and gene selection model by incorporating pathway structures from an expert database. We conclude that the incorporation of appropriate information from extra resources enables a more reliable assessment and a higher prediction performance in RNA sequencing data analysis.
机译:随着近几十年来高通量测序技术的发展,已经产生了具有高分辨率的大规模基因组数据,以解决许多领域中的各种问题。 RNA测序是最先进的测序技术之一,已被广泛用于通过数百万次读取来研究生物系统的转录组。 RNA测序生物信息学算法的最终目标是最大程度地利用大量拼凑的读段中存储的信息来揭示转录组水平上生物学功能的整体情况。已开发出许多生物信息学方法和管道以更好地实现这一目标。目标。然而,RNA测序的一个主要问题是由于短读长度和表达不足的转录物采样率低而导致的预测不确定性。这两种情况在阅读映射,转录本组装,转录本定量乃至下游分析中都产生了歧义。本论文着重于通过将不同种类的其他信息纳入生物信息学模型和建模评估来减少上述不确定性的方法。我解决了RNA测序数据分析中的三个关键问题。 (1)我们使用来自第三代测序平台的转录信息评估了当前从头组装方法及其评估方法的性能,该平台提供了更长的序列长度,但错误率高于下一代测序; (2)我们建立了一个贝叶斯图形模型,通过利用来自生物复制品的共享信息来改善转录物的定量和差异表达的同工型鉴定。 (3)我们通过整合来自专家数据库的途径结构,建立了联合途径和基因选择模型。我们得出的结论是,从额外资源中整合适当的信息可以在RNA测序数据分析中实现更可靠的评估和更高的预测性能。

著录项

  • 作者

    Yang, Yifan.;

  • 作者单位

    Purdue University.;

  • 授予单位 Purdue University.;
  • 学科 Bioinformatics.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 147 p.
  • 总页数 147
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号