首页> 美国卫生研究院文献>Journal of Computational Biology >An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile
【2h】

An Enumerative Combinatorics Model for Fragmentation Patterns in RNA Sequencing Provides Insights into Nonuniformity of the Expected Fragment Starting-Point and Coverage Profile

机译:RNA测序中片段化模式的枚举组合模型提供了对预期片段起点和覆盖范围的不均匀性的见解

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

>RNA sequencing (RNA-seq) has emerged as the method of choice for measuring the expression of RNAs in a given cell population. In most RNA-seq technologies, sequencing the full length of RNA molecules requires fragmentation into smaller pieces. Unfortunately, the issue of nonuniform sequencing coverage across a genomic feature has been a concern in RNA-seq and is attributed to biases for certain fragments in RNA-seq library preparation and sequencing. To investigate the expected coverage obtained from fragmentation, we develop a simple fragmentation model that is independent of bias from the experimental method and is not specific to the transcript sequence. Essentially, we enumerate all configurations for maximal placement of a given fragment length, F, on transcript length, T, to represent every possible fragmentation pattern, from which we compute the expected coverage profile across a transcript. We extend this model to incorporate general empirical attributes such as read length, fragment length distribution, and number of molecules of the transcript. We further introduce the fragment starting-point, fragment coverage, and read coverage profiles. We find that the expected profiles are not uniform and that factors such as fragment length to transcript length ratio, read length to fragment length ratio, fragment length distribution, and number of molecules influence the variability of coverage across a transcript. Finally, we explore a potential application of the model where, with simulations, we show that it is possible to correctly estimate the transcript copy number for any transcript in the RNA-seq experiment.
机译:> RNA测序(RNA-seq)已成为测量给定细胞群体中RNA表达的一种选择方法。在大多数RNA测序技术中,对全长RNA分子进行测序需要将其片段化为较小的片段。不幸的是,整个基因组特征的不均匀测序覆盖率问题一直是RNA-seq所关注的问题,并且归因于RNA-seq文库制备和测序中某些片段的偏倚。为了调查从片段获得的预期覆盖率,我们开发了一个简单的片段模型,该模型独立于实验方法的偏倚,并且不特定于转录本序列。本质上,我们列举了在给定的片段长度T上最大放置给定片段长度F的所有配置,以代表每个可能的片段化模式,据此我们可以计算出整个片段的预期覆盖范围。我们扩展此模型以合并一般的经验属性,例如读取长度,片段长度分布和转录本的分子数。我们进一步介绍了片段的起点,片段覆盖率和读取覆盖率配置文件。我们发现预期的轮廓不均匀,并且诸如片段长度与转录物长度之比,读取长度与片段长度之比,片段长度分布以及分子数之类的因素影响整个转录物的覆盖范围的变异性。最后,我们探索了该模型的潜在应用,其中通过仿真显示,可以正确估计RNA-seq实验中任何转录本的转录本拷贝数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号