首页> 外文会议>Annual research conference of the South African Institute of Computer Scientists and Information Technologists 2010 >PH2: An Hadoop-based framework for mining structural properties from the PDB Database
【24h】

PH2: An Hadoop-based framework for mining structural properties from the PDB Database

机译:PH2:基于Hadoop的框架,用于从PDB数据库中挖掘结构属性

获取原文
获取原文并翻译 | 示例

摘要

PH2 is an Hadoop and SQL-based tool for extracting information out of the Protein Database (PDB) quickly. The PDB database is stored as a set of Hadoop sequence files in a replicated way on the Hadoop Distributed File System. PH2 then allows a user to provide queries about 3D structures (and other properties) in SQL, and for these queries to be run in a highly-parallel manner using the Hadoop framework. PDB is an important source of information about structural and other properties of proteins, and it currently contains about 65000 protein structures. Determining which proteins have particular shapes is an important bioinformatics application. PH2 parses each PDB file, creates a SQL database for it and then performs the appropriate queries. Experiments performed on a small local cluster and a large shared cluster show that the application is highly-scalable. On the large cluster, a complex real query takes less than 4 minutes to search the whole of PDB.
机译:PH2是一种基于Hadoop和SQL的工具,用于快速从蛋白质数据库(PDB)中提取信息。 PDB数据库以一组Hadoop序列文件的形式以复制的方式存储在Hadoop分布式文件系统上。然后,PH2允许用户在SQL中提供有关3D结构(和其他属性)的查询,并使用Hadoop框架以高度并行的方式运行这些查询。 PDB是有关蛋白质的结构和其他属性的重要信息来源,目前包含约65000个蛋白质结构。确定哪些蛋白质具有特定形状是重要的生物信息学应用。 PH2解析每个PDB文件,为其创建一个SQL数据库,然后执行适当的查询。在小型本地群集和大型共享群集上进行的实验表明,该应用程序具有高度可伸缩性。在大型集群上,一个复杂的真实查询只需不到4分钟的时间即可搜索整个PDB。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号