PH2: An Hadoop-based framework for mining structural properties from the PDB Database

机译：PH2：基于Hadoop的框架，用于从PDB数据库中挖掘结构属性

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

PH2 is an Hadoop and SQL-based tool for extracting information out of the Protein Database (PDB) quickly. The PDB database is stored as a set of Hadoop sequence files in a replicated way on the Hadoop Distributed File System. PH2 then allows a user to provide queries about 3D structures (and other properties) in SQL, and for these queries to be run in a highly-parallel manner using the Hadoop framework. PDB is an important source of information about structural and other properties of proteins, and it currently contains about 65000 protein structures. Determining which proteins have particular shapes is an important bioinformatics application. PH2 parses each PDB file, creates a SQL database for it and then performs the appropriate queries. Experiments performed on a small local cluster and a large shared cluster show that the application is highly-scalable. On the large cluster, a complex real query takes less than 4 minutes to search the whole of PDB.

机译：PH2是一种基于Hadoop和SQL的工具，用于快速从蛋白质数据库（PDB）中提取信息。 PDB数据库以一组Hadoop序列文件的形式以复制的方式存储在Hadoop分布式文件系统上。然后，PH2允许用户在SQL中提供有关3D结构（和其他属性）的查询，并使用Hadoop框架以高度并行的方式运行这些查询。 PDB是有关蛋白质的结构和其他属性的重要信息来源，目前包含约65000个蛋白质结构。确定哪些蛋白质具有特定形状是重要的生物信息学应用。 PH2解析每个PDB文件，为其创建一个SQL数据库，然后执行适当的查询。在小型本地群集和大型共享群集上进行的实验表明，该应用程序具有高度可伸缩性。在大型集群上，一个复杂的真实查询只需不到4分钟的时间即可搜索整个PDB。

著录项

来源
《Annual research conference of the South African Institute of Computer Scientists and Information Technologists 2010》|2010年|p.104-112|共9页
会议地点 Bela Bela(ZA);Bela Bela(ZA)
作者
Scott Hazelhurst;
展开▼
作者单位

School of Electrical and Information Engineering University of the Witwatersrand, Johannesburg Private Bag 3, 2050 Wits, South Africa;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
hadoop; PDB; parallel computing; structural information;

机译：Hadoop PDB；并行计算；结构信息;

相似文献

外文文献
中文文献
专利

1. pi-Hole Interactions with Various Nitro Compounds Relevant for Medicine: DFT Calculations and Surveys of the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) [J] . Hoffmann Jari M., Sadhoe Akshay K., Mooibroek Tiddo J. Synthesis: International Journal of Methods in Synthetic Organic Chemistry . 2020,第4期

机译：与药品相关的各种硝基化合物的Pi-孔相互作用：DFT计算和剑桥结构数据库（CSD）和蛋白质数据库的调查（PDB）
2. Indel PDB: A database of structural insertions and deletions derived from sequence alignments of closely related proteins [J] . Michael Hsing, Artem Cherkasov BMC Bioinformatics . 2008,第1期

机译：Indel PDB：由紧密相关蛋白的序列比对得出的结构插入和缺失数据库
3. PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics [J] . Marcin von Grotthuss, Dariusz Plewczynski, Krzysztof Ginalski, BMC Bioinformatics . 2006,第1期

机译：PDB-UF：来自结构基因组学的未注释蛋白结构的预测酶功能预测数据库
4. PH2: An Hadoop-based framework for mining structural properties from the PDB Database [C] . Scott Hazelhurst Annual research conference of the South African Institute of Computer Scientists and Information Technologists . 2010

机译：PH2：一种基于Hadoop的框架，用于从PDB数据库采集结构性属性
5. A text mining framework linking technical intelligence from publication databases to strategic technology decisions. [D] . Courseault, Cherie R. 2004

机译：一个文本挖掘框架，将发布数据库中的技术情报链接到战略技术决策。
6. Indel PDB: A database of structural insertions and deletions derived from sequence alignments of closely related proteins [O] . Michael Hsing, Artem Cherkasov 2008

机译：Indel PDB：一个由紧密相关蛋白的序列比对产生的结构插入和缺失的数据库
7. Model compounds for the active sites of oxo-transfer molybdoenzymes. Synthesis, structural characterization, and electrochemical properties of NH42MoO2{O2CC(S)Ph2}2 [O] . Gómez-Romero P., Cervilla Antonio 2016

机译：羰基转移钼酶活性位点的模型化合物。 NH4 2 MoO2 {O2CC（S）Ph2} 2的合成，结构表征和电化学性质
8. Creative PDB's (parts databases) [R] . Cote, T. J. 1998

机译：Creative pDB（零件数据库）

PH2: An Hadoop-based framework for mining structural properties from the PDB Database

摘要

著录项

相似文献

相关主题

期刊订阅