首页> 外文期刊>BMC Bioinformatics >WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
【24h】

WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

机译:WordSeeker:并发生物信息学软件,用于发现全基因组模式和基于单词的基因组签名

获取原文
           

摘要

BackgroundAn important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets.MethodsThis manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model.ResultsA comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center.ConclusionWordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.
机译:背景技术基因组科学的一个重要重点是基因组内所有功能元件的发现和表征。基因组研究中使用计算机方法,以发现推测的调控基因组元件(称为单词或模体)。尽管已经开发出许多用于发现基序的方法,但大多数方法都缺乏分析大型基因组数据集所需的可伸缩性。方法本手稿介绍了WordSeeker,这是一种枚举的基序发现工具箱,该工具箱利用多核和分布式计算平台来对可扩展性进行分析。基因组数据。控制器任务协调工作节点的活动,每个工作节点(1)枚举DNA单词空间的一个子集,(2)使用分布式马尔可夫链模型对单词评分。结果进行了一套全面的性能测试,以证明性能,加速和WordSeeker的效率。该工具包的可扩展性使得能够分析拟南芥的整个基因组。分析结果被整合到拟南芥基因监管信息服务器(AGRIS)中。 WordSeeker的公共版本已部署在俄亥俄州超级计算机中心的Glenn群集上。结论WordSeeker有效利用并发计算平台来识别基因组数据集中的假定功能元素。此功能有助于分析大量测序的基因组数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号