首页> 美国卫生研究院文献>Journal of Computational Biology >Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees
【2h】

Improved Search of Large Transcriptomic Sequencing Databases Using Split Sequence Bloom Trees

机译:改进的大型转录组测序数据库的搜索使用拆分序列绽放树

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

>Enormous databases of short-read RNA-seq experiments such as the NIH Sequencing Read Archive are now available. These databases could answer many questions about condition-specific expression or population variation, and this resource is only going to grow over time. However, these collections remain difficult to use due to the inability to search for a particular expressed sequence. Although some progress has been made on this problem, it is still not feasible to search collections of hundreds of terabytes of short-read sequencing experiments. We introduce an indexing scheme called split sequence bloom trees (SSBTs) to support sequence-based querying of terabyte scale collections of thousands of short-read sequencing experiments. SSBT is an improvement over the sequence bloom tree (SBT) data structure for the same task. We apply SSBTs to the problem of finding conditions under which query transcripts are expressed. Our experiments are conducted on a set of 2652 publicly available RNA-seq experiments for the breast, blood, and brain tissues. We demonstrate that this SSBT index can be queried for a 1000 nt sequence in <4 minutes using a single thread and can be stored in just 39 GB, a fivefold improvement in search and storage costs compared with SBT.
机译:>大量的短读RNA序列实验数据库,例如NIH Sequencing Read Archive。这些数据库可以回答有关条件特定的表达或种群变异的许多问题,并且这种资源只会随着时间的推移而增长。但是,由于无法搜索特定的表达序列,这些集合仍然难以使用。尽管在此问题上已经取得了一些进展,但搜索数百TB的短读测序实验的集合仍然不可行。我们引入了一个称为拆分序列开花树(SSBT)的索引方案,以支持对数千个短读测序实验的TB级集合的基于序列的查询。 SSBT是对同一任务的序列绽放树(SBT)数据结构的改进。我们将SSBT应用于发现查询成绩单表达条件的问题。我们的实验是在一套针对乳房,血液和脑组织的2652个公开可用的RNA-seq实验上进行的。我们证明该SSBT索引可以使用一个线程在<4分钟内查询1000 nt序列,并且可以存储在39 GB中,与SBT相比,搜索和存储成本提高了五倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号