...
首页> 外文期刊>BMC Bioinformatics >LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks
【24h】

LAITOR4HPC: A text mining pipeline based on HPC for building interaction networks

机译:leakor4HPC:基于HPC构建交互网络的文本挖掘管道

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The amount of published full-text articles has increased dramatically. Text mining tools configure an essential approach to building biological networks, updating databases and providing annotation for new pathways. PESCADOR is an online web server based on LAITOR and NLProt text mining tools, which retrieves protein-protein co-occurrences in a tabular-based format, adding a network schema. Here we present an HPC-oriented version of PESCADOR’s native text mining tool, renamed to LAITOR4HPC, aiming to access an unlimited abstract amount in a short time to enrich available networks, build new ones and possibly highlight whether fields of research have been exhaustively studied. By taking advantage of parallel computing HPC infrastructure, the full collection of MEDLINE abstracts available until June 2017 was analyzed in a shorter period (6 days) when compared to the original online implementation (with an estimated 2 years to run the same data). Additionally, three case studies were presented to illustrate LAITOR4HPC usage possibilities. The first case study targeted soybean and was used to retrieve an overview of published co-occurrences in a single organism, retrieving 15,788 proteins in 7894 co-occurrences. In the second case study, a target gene family was searched in many organisms, by analyzing 15 species under biotic stress. Most co-occurrences regarded Arabidopsis thaliana and Zea mays. The third case study concerned the construction and enrichment of an available pathway. Choosing A. thaliana for further analysis, the defensin pathway was enriched, showing additional signaling and regulation molecules, and how they respond to each other in the modulation of this complex plant defense response. LAITOR4HPC can be used for an efficient text mining based construction of biological networks derived from big data sources, such as MEDLINE abstracts. Time consumption and data input limitations will depend on the available resources at the HPC facility. LAITOR4HPC enables enough flexibility for different approaches and data amounts targeted to an organism, a subject, or a specific pathway. Additionally, it can deliver comprehensive results where interactions are classified into four types, according to their reliability.
机译:已发表的全文文章的金额急剧增加。文本挖掘工具配置建设生物网络,更新数据库并为新途径提供注释的基本方法。 PESCDADOR是一个基于莱特尔和NLPROT文本挖掘工具的在线网络服务器,其以基于表格的格式检索蛋白质 - 蛋白质的共同,添加网络模式。在这里,我们展示了一个以HPC为导向的PESCRADOR的本地文本挖掘工具版本,重命名为Laitor4HPC,旨在在短时间内访问无限的抽象金额,以丰富可用的网络,建立新的网络,并可能突出研究的研究领域是否已经彻底研究了。通过利用并行计算的HPC基础架构,与原始在线实施相比,在较短的时间(6天)(6天)进行分析到2017年6月至2017年6月的完整集合(估计2年运行相同数据)。此外,提出了三种案例研究以说明莱特奥4HPC使用可能性。第一种案例研究靶向大豆,用于检索单一生物体中已发表的共生病的概述,在7894年的共同发生中检索15,788蛋白。在第二种案例研究中,通过在生物应激下分析15种,在许多生物中搜索靶基因家族。大多数共同发生都认为拟南芥和Zea Mays。第三种案例研究涉及可用途径的建设和富集。选择A.拟南芥进行进一步分析,富硒途径富集,显示出额外的信号传导和调节分子,以及它们如何在这种复杂的植物防御反应的调节中互相响应。 Lait4HPC可用于基于基于Big Data Sources的生物网络建设,例如Medline摘要。时间消耗和数据输入限制将取决于HPC设施的可用资源。 Laitor4HPC能够为有机体,受试者或特定途径靶向的不同方法和数据量的灵活性。此外,它可以根据其可靠性提供互动分为四种类型的全面结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号