GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

Paolo Pannarale; Domenico Catalano; Giorgio De Caro; Giorgio Grillo; Pietro Leo; Graziano Pappadà; Francesco Rubino; Gaetano Scioscia; Flavio Licciulli

首页> 外文期刊>BMC Bioinformatics >GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

【24h】

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

机译：GIDL：基于规则的专家系统，用于将GenBank智能数据加载到分子生物多样性数据库中

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundIn the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database.MethodsThe GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology.The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing.Results and conclusionsEntries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.

机译：背景技术在科学的生物多样性界，人们越来越意识到有必要在分子和传统生物多样性研究之间架起一座桥梁。我们认为，信息技术在将这些研究产生的信息与我们可以在生物信息学公共数据库中找到的大量分子数据整合在一起方面可能发挥着重要作用。这项工作的主要目的是通过开发GIDL（一种结合了分子生物多样性数据库的智能数据加载器）来构建用于整合公共和私人生物多样性数据的生物信息学基础设施。本文介绍的系统以本体论的方式进行组织，并在本地存储GenBank主数据库中包含的序列和注释数据。方法GIDL体系结构由关系数据库和智能数据加载器软件组成。关系数据库模式旨在管理生物多样性信息（分子生物多样性数据库），它分为四个领域：分子数据，实验，收集和分类法。 MolecularData区域受到通用模型生物体数据库Chado关系模式中已建立标准的启发。 Chado的独特性及其优势在于采用了使用序列本体的本体模式。GIDL的智能数据加载器（IDL）组件是能够解析数据以发现数据的提取，转换和加载软件。隐藏在GenBank条目中的信息，并填充分子生物多样性数据库。 IDL由三个主要模块组成：解析器，能够解析GenBank平面文件;推理机，它会自动构建CLIPS事实，以映射序列本体所表达的生物学知识; DBFiller，它将CLIPS事实转换为用于填充数据库的有序SQL语句。由于其在数据表示，集成和处理方面的优势，在GIDL中采用了语义Web技术。结果与结论GenBank rel.180的病毒（814,122），植物（1,365,360）和无脊椎动物（959,065）部门的条目已被加载到GIDL的分子生物多样性数据库。与最常用的序列检索系统（如Entrez或SRS）相比，我们的系统结合了序列本体和Chado模式，可提供更强大的查询表达能力。

著录项

来源
《BMC Bioinformatics》 |2012年第4期|共页
作者
Paolo Pannarale; Domenico Catalano; Giorgio De Caro; Giorgio Grillo; Pietro Leo; Graziano Pappadà; Francesco Rubino; Gaetano Scioscia; Flavio Licciulli;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Evaluation of a rule base for identifying contact allergens by using a regulatory database: Comparison of data on chemicals notified in the European Union with "structural alerts" used in the DEREK expert system. [J] . Zinke S, Gerner I, Schlede E Alternatives to laboratory animals: ATLA . 2002,第3期

机译：使用监管数据库评估用于识别接触过敏原的规则库：将欧盟通报的化学品数据与DEREK专家系统中使用的“结构警报”进行比较。
2. Automated extraction of medical expert system rules from clinical databases based on rough set theory [J] . Tsumoto S. Information Sciences: An International Journal . 1998,第1a4期

机译：基于粗糙集理论从临床数据库中自动提取医学专家系统规则
3. Basic construction of intelligent expert system for riser design using database system and optimisation tools [J] . C. H. Lim, S. H. Cho, Y. C. Lee, International Journal Cast Metals Research . 2005,第4期

机译：使用数据库系统和优化工具进行立管设计智能专家系统的基础构建
4. Integrating a relational database management system and a fuzzy rule based expert system [C] . Santoso, P.B. Intelligent Information Systems,1994. Proceedings of the 1994 Second Australian and New Zealand Conference on . 1994

机译：集成关系数据库管理系统和基于模糊规则的专家系统
5. ENSURING RULE-BASE INTEGRITY OF AN EXPERT SYSTEM USING A RELATIONAL DATABASE [D] . STEFANEK, GEORGE LOUIS. 1987

机译：使用关系数据库确保专家系统的规则完整性
6. GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database [O] . Paolo Pannarale, Domenico Catalano, Giorgio De Caro, 2012

机译：GIDL：基于规则的专家系统用于将GenBank智能数据加载到分子生物多样性数据库中
7. GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database [O] . Paolo Pannarale, Domenico Catalano, Giorgio De Caro, 2012

机译：GIDL：基于规则的专家系统，用于将GenBank智能数据加载到分子生物多样性数据库中
8. Intelligent Load Manager (LOADMAN): Application of Expert System Technology to Load Management Problems in Power Generation and Distribution Systems. Phase 1 [R] . Huang, H. W. 1988

机译：智能负载管理器（LOaDmaN）：专家系统技术在发电和配电系统负荷管理问题中的应用。阶段1

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

摘要

著录项

相似文献

相关主题

期刊订阅