首页> 外文学位 >Transcriptome Analysis and Applications Based on Next-generation RNA Sequencing Data.
【24h】

Transcriptome Analysis and Applications Based on Next-generation RNA Sequencing Data.

机译:基于下一代RNA测序数据的转录组分析和应用。

获取原文
获取原文并翻译 | 示例

摘要

The recent development of next generation RNA-sequencing, termed 'RNA-Seq', has offered an opportunity to explore the RNA transcripts from the whole transcriptome. As a revolutionary method, RNA-Seq not only could precisely measure the abundances of transcripts, but discover the novel transcribed contents and uncover the unknown regulatory mechanisms. Meanwhile, the combination of different levels of next-generation sequencing, such as genome sequencing and methylome sequencing has provided a powerful tool for novel discovery in the biological context.;My PhD study focuses on the analysis of next-generation sequencing data, especially on RNA-Seq data. It mainly includes three parts: pipeline development analysis, data analysis and mechanistic study.;As the next-generation sequencing (NGS) technology, the analysis of massive NGS data is a great challenge. Many existing general aligners (as contrast to splicing-aware alignment tools) are capable of mapping millions of sequencing reads onto a reference genome. However, they are neither designed for reads that span across splice junctions (spliced reads) nor for reads that could match multiple locations along the reference genome (multireads). Hence, we have developed an ab initio mapping method - ABMapper, using two-seed strategy. The benchmark results show that ABMapper can get higher accuracy and recall compared with the same kind of tools: TopHat and SpliceMap. On the other hand, the selection of the most probable location for spliced reads and multireads becomes a big problem. These reads are randomly assigned to one of the possible locations or discarded completely when calculating the expression level, which would bias the downstream analysis, such as the differentiated expression analysis and alternative splicing analysis. To rationally determine the location of spliced reads and multireads, we have proposed a maximum likelihood estimation method based on a geometric-tail (GT) distribution of intron length. This probabilistic model deals with splice junctions between reads, or those encompassed in one or both of a pair-ended (PE) reads. Based on this model, multiple alignments of reads within a PE pair can be properly resolved.;The accumulation of NGS data has provided rich resources for deep discovery of biological significance. We have integrated RNA-Seq data and methylation sequencing data to build a predictive model for the regulation of gene expression based on DNA methylation patterns. We found that DNA methylation could predict gene expression fairly accurately and the accuracy can reach up to 78%. We have also found DNA methylation at gene body is the most important region in these models, even more useful than promoter. Finally, feature overlap network based on an optimum subset of combination of all methylation patterns and CpG patterns has indicated the collaborative regulation of gene expression by DNA methylation patterns.;Not only new algorithms were developed to facilitate the RNA-Seq data analysis, but the transcriptome analysis was performed on zebrafish. The analysis of differentially-expressed genes and pathways involved after calycosin treatment, combined with other experimental evidence such as fluorescence microscopy and quantitative real-time polymerase chain reaction (qPCR), has well demonstrated the proangiogenic effects of calycosin in vivo.;In summary, this thesis detailed my work on NGS data analysis, discovery of biological significance using data-mining algorithms and transcriptome analysis.
机译:下一代RNA测序的最新进展称为“ RNA-Seq”,为从整个转录组中探索RNA转录本提供了机会。作为一种革命性的方法,RNA-Seq不仅可以精确地测量转录本的丰度,而且可以发现新颖的转录内容并揭示未知的调控机制。同时,不同水平的下一代测序技术(例如基因组测序和甲基化组测序)的结合为生物学背景下的新发现提供了强大的工具。;我的博士研究专注于分析下一代测序数据,尤其是在RNA-Seq数据。它主要包括三个部分:管道开发分析,数据分析和机理研究。作为下一代测序(NGS)技术,海量NGS数据的分析是一个巨大的挑战。许多现有的通用比对仪(与可识别剪接的比对工具相比)能够将数百万条测序读图映射到参考基因组上。但是,它们既不设计用于跨剪接点的读取(剪接读取),也不设计用于匹配参考基因组多个位置的读取(多重读取)。因此,我们开发了一种使用两种子策略的从头算映射方法-ABMapper。基准测试结果表明,与同类工具TopHat和SpliceMap相比,ABMapper可以获得更高的准确性和召回率。另一方面,为拼接阅读和多重阅读选择最可能的位置成为一个大问题。这些读数被随机分配到一个可能的位置,或者在计算表达水平时被完全丢弃,这可能会影响下游分析,例如差异表达分析和选择性剪接分析。为了合理确定拼接阅读和多重阅读的位置,我们提出了一种基于内含子长度的几何尾(GT)分布的最大似然估计方法。此概率模型处理读段之间或一对末端(PE)读段之一或两个中包含的剪接点。基于此模型,可以正确解决PE对中多个读取的比对问题。NGS数据的积累为深入发现生物学意义提供了丰富的资源。我们已经整合了RNA-Seq数据和甲基化测序数据,以建立基于DNA甲基化模式调控基因表达的预测模型。我们发现,DNA甲基化可以相当准确地预测基因表达,准确性可以达到78%。我们还发现,基因体中的DNA甲基化是这些模型中最重要的区域,甚至比启动子更有用。最后,基于所有甲基化模式和CpG模式组合的最佳子集的特征重叠网络表明DNA甲基化模式对基因表达的协同调节。;不仅开发了新的算法来促进RNA-Seq数据分析,而且在斑马鱼上进行转录组分析。对花胶素处理后涉及的差异表达基因和途径的分析,结合荧光显微镜和定量实时聚合酶链反应(qPCR)等其他实验证据,充分证明了花胶素在体内的促血管生成作用。本文详细介绍了我在NGS数据分析,利用数据挖掘算法和转录组分析发现生物学意义方面的工作。

著录项

  • 作者

    Lou, Shaoke.;

  • 作者单位

    The Chinese University of Hong Kong (Hong Kong).;

  • 授予单位 The Chinese University of Hong Kong (Hong Kong).;
  • 学科 Biology Molecular.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 158 p.
  • 总页数 158
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号