Using Neo4j for Mining Protein Graphs: A Case Study

机译：使用Neo4j挖掘蛋白质图：一个案例研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Using graph databases becomes increasingly popular in domains where data can be modeled as a set of connected objects. Graph databases enable to query such data using graphbased queries in a relatively simple manner in comparison to the classical relational databases. In this paper, we show how one of the most popular graph databases, Neo4j, can be applied to the bioinformatics problem of protein-protein interface (PPI) identification. The goal of the PPI identification task is, given a protein structure, to identify amino acids which are responsible for binding of the structure to other proteins. Each protein structure consists of a set of amino acid molecules which can be conceived as a graph and multitude of methods for analysis of such protein graphs have been established. We introduce here a knowledge-based approach which can enhance the quality of these methods by utilizing existing protein structure knowledge stored in the Protein Data Bank (PDB). We show how to transform information about protein complexes from PDB into Neo4j where they can be stored as a set of independent protein graphs. The resulting graph database contains about 14 millions labeled nodes and 38 millions edges. In the PPI identification phase, this database is queried using exact subgraph matching and the results are aggregated to improve an existing PPI identification method. We show the pros and cons of using Neo4j for such endeavor with respect to the size of the database and complexity of the queries in comparison to using a relational database (Microsoft SQL Server). We conclude that using Neo4j is a viable option for specific, rather small, subgraph query types. However, we have encountered performance limitations, especially for larger query graphs in terms of number of edges.

机译：在可以将数据建模为一组连接对象的领域中，使用图形数据库变得越来越流行。与传统的关系数据库相比，图形数据库能够以相对简单的方式使用基于图形的查询来查询此类数据。在本文中，我们展示了如何将最流行的图形数据库之一Neo4j应用于蛋白质-蛋白质界面（PPI）识别的生物信息学问题。给定蛋白质结构，PPI鉴定任务的目标是鉴定负责将该结构与其他蛋白质结合的氨基酸。每个蛋白质结构由一组氨基酸分子组成，这些氨基酸分子可以被认为是一个图，并且已经建立了许多用于分析这种蛋白质图的方法。我们在这里介绍一种基于知识的方法，该方法可以通过利用存储在蛋白质数据库（PDB）中的现有蛋白质结构知识来提高这些方法的质量。我们展示了如何将有关蛋白质复合物的信息从PDB转换为Neo4j，在其中它们可以存储为一组独立的蛋白质图。生成的图形数据库包含大约1400万个带标签的节点和3800万个边。在PPI识别阶段，使用精确的子图匹配来查询该数据库，并对结果进行汇总以改进现有的PPI识别方法。与使用关系数据库（Microsoft SQL Server）相比，我们展示了Neo4j在数据库规模和查询复杂性方面进行利弊的利弊。我们得出结论，对于特定的，较小的子图查询类型，使用Neo4j是可行的选择。但是，我们遇到了性能限制，尤其是对于较大的查询图而言，在边数方面。

著录项

来源
《International workshop on database and expert systems applications》|2015年|230-234|共5页
会议地点
作者
David Hoksza; Jan Jelinek;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Neo4j; graph databases; protein-protein interactions;

机译：Neo4j;图数据库;蛋白质间相互作用;

相似文献

外文文献
中文文献
专利

1. Graph-based Text Mining and Information Retrieval with Neo4j [J] . Mario Kubek Fortschritt-Berichte VDI . 2017,第857期

机译：Neo4j的基于图的文本挖掘和信息检索
2. Applying fc-vertex cardinality constraints on a Neo4j graph database [J] . Martina Sestak, Marjan Hericko, Tatjana Welzer Druzovec, Future generation computer systems . 2021,第Feba期

机译：在neo4j图表数据库上应用fc-顶点基数约束
3. Graph communities in Neo4j Four algorithms at work [J] . Drakopoulos Georgios, Gourgaris Panagiotis, Kanavos Andreas Evolving Systems . 2020,第3期

机译：Neo4J的图形社区在工作中的四种算法
4. Using Neo4j for Mining Protein Graphs: A Case Study [C] . David Hoksza, Jan Jelinek International workshop on database and expert systems applications . 2015

机译：使用Neo4J进行采矿蛋白质图：一个案例研究
5. Graph mining and module detection in protein-protein interaction networks. [D] . Shen, Ru. 2014

机译：蛋白质-蛋白质相互作用网络中的图形挖掘和模块检测。
6. Neo4j graph database realizes efficient storage performance of oilfield ontology [O] . Faming Gong, Yuhui Ma, Wenjuan Gong, -1

机译：Neo4j图形数据库实现油田本体的高效存储性能
7. Data Analysis of EMS Sorting System on the Basics of Graph Database Neo4j [O] . Rui Wang 2019

机译：“图表数据库基础数据”数据分析Neo4J

Using Neo4j for Mining Protein Graphs: A Case Study

摘要

著录项

相似文献

相关主题

期刊订阅