首页> 美国卫生研究院文献>G3: GenesGenomesGenetics >ARSDA: A New Approach for Storing Transmitting and Analyzing Transcriptomic Data
【2h】

ARSDA: A New Approach for Storing Transmitting and Analyzing Transcriptomic Data

机译:ARSDA:一种存储传输和分析转录组数据的新方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Two major stumbling blocks exist in high-throughput sequencing (HTS) data analysis. The first is the sheer file size, typically in gigabytes when uncompressed, causing problems in storage, transmission, and analysis. However, these files do not need to be so large, and can be reduced without loss of information. Each HTS file, either in compressed .SRA or plain text .fastq format, contains numerous identical reads stored as separate entries. For example, among 44,603,541 forward reads in the SRR4011234.sra file (from a Bacillus subtilis transcriptomic study) deposited at NCBI’s SRA database, one read has 497,027 identical copies. Instead of storing them as separate entries, one can and should store them as a single entry with the SeqID_NumCopy format (which I dub as FASTA+ format). The second is the proper allocation of reads that map equally well to paralogous genes. I illustrate in detail a new method for such allocation. I have developed ARSDA software that implement these new approaches. A number of HTS files for model species are in the process of being processed and deposited at to demonstrate that this approach not only saves a huge amount of storage space and transmission bandwidth, but also dramatically reduces time in downstream data analysis. Instead of matching the 497,027 identical reads separately against the B. subtilis genome, one only needs to match it once. ARSDA includes functions to take advantage of HTS data in the new sequence format for downstream data analysis such as gene expression characterization. I contrasted gene expression results between ARSDA and Cufflinks so readers can better appreciate the strength of ARSDA. ARSDA is freely available for Windows, Linux. and Macintosh computers at .
机译:高通量测序(HTS)数据分析中存在两个主要绊脚石。第一个是纯粹的文件大小,未压缩时通常以GB为单位,从而导致存储,传输和分析问题。但是,这些文件不必太大,可以在不丢失信息的情况下进行减少。每个HTS文件(采用压缩.SRA或纯文本.fastq格式)都包含许多相同的读取,这些读取存储为单独的条目。例如,存放在NCBI的SRA数据库中的SRR4011234.sra文件(来自枯草芽孢杆菌转录组研究)中的44,603,541个正向读取中,一个读取具有497,027个相同的副本。除了可以将它们存储为单独的条目之外,还可以并且应该将它们存储为具有SeqID_NumCopy格式(我称为FASTA +格式)的单个条目。第二个是适当分配的读段,这些读段同样良好地映射到旁系同源基因。我详细说明了这种分配的新方法。我已经开发了实现这些新方法的ARSDA软件。许多用于模型种类的HTS文件正在处理和存放中,以证明该方法不仅节省了大量的存储空间和传输带宽,而且还大大减少了下游数据分析的时间。无需分别针对枯草芽孢杆菌基因组匹配497,027个相同的读数,只需匹配一次即可。 ARSDA包括利用新序列格式的HTS数据进行下游数据分析(例如基因表达表征)的功能。我对比了ARSDA和袖扣之间的基因表达结果,以便读者更好地理解ARSDA的强度。 ARSDA可免费用于Windows,Linux。和Macintosh计算机位于。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号