TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction

机译：TMAC：一种用于构建带注释的语料的自动化挖掘工具，以支持蛋白质 - 蛋白质相互作用信息提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting protein-protein interaction (PPI) from biomedical literatures is a meaningful topic in protein science. Annotated corpora are important to the development and evaluation of protein-protein interaction extraction systems. So it is important to construct a text mining tool for the annotation of any corpus for protein name and interaction events for the identification of interactions among proteins. In this paper we present a java package called the TMAC system. TMAC tagged protein names and interaction events in biomedical literatures based on a combination of carefully designed rules and a dictionary of protein names. TMAC is able to normalize the results of protein mentions and interaction events found by offering the appropriate database reference. TMAC is divided into two modules. The first module is the Name entity identification and normalization module. The second module is the interaction event tagger for the identification of words that will ensure the occurrence of the interaction. TMAC achieved an average of 85.2% precision, 76.7% recall for the protein identification process. TMAC achieved an average of 88.2% precision, 71.8% recall for the protein - protein interaction event identification process. TMAC is a flexible system. It could be used as a standalone application or can be incorporated in the workflow of a more general text mining system.

机译：从生物医学文献中提取蛋白质 - 蛋白质相互作用（PPI）是蛋白质科学中有意义的课题。注释的Corpora对蛋白质 - 蛋白质相互作用提取系统的开发和评估非常重要。因此，构建文本挖掘工具，用于鉴定蛋白质中的相互作用的蛋白质名称和相互作用事件的任何语料库。在本文中，我们介绍了一个名为TMAC系统的Java包。基于精心设计的规则和蛋白质名称词典的组合，生物医学文献中的TMAC标记蛋白质名称和交互事件。 TMAC能够通过提供适当的数据库参考来规范蛋白质提及和互动事件的结果。 TMAC分为两个模块。第一个模块是名称实体识别和归一化模块。第二个模块是交互事件标记器，用于识别将确保互动发生的单词。 TMAC平均达到85.2％的精度，76.7％召回蛋白质鉴定过程。 TMAC平均达到88.2％的精度，71.8％召回蛋白质 - 蛋白质相互作用事件识别过程。 TMAC是一个灵活的系统。它可以用作独立应用程序，或者可以包含在更一般的文本挖掘系统的工作流程中。

著录项

来源
《International Conference on Computer Technology and Development》|2010年||共5页
会议地点
作者
Rania Ahmed Abdel Azzem; Seoud Abul;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
named entity recognition; protein normalization; text-mining;

机译：命名实体识别;蛋白质标准化;文本挖掘;

相似文献

外文文献
中文文献
专利

1. GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction [J] . So-Yeon Oh, Ji-Hyeon Kim, Seo-Jin Kim, Genomics & Informatics . 2018,第3期

机译：GNI语料库版本1.0：带有基因组学和信息学的注释全文语料库，以支持生物医学信息提取
2. PPLook: an automated data mining tool for protein-protein interaction [J] . Shao-Wu Zhang, Yao-Jun Li, Li Xia, BMC Bioinformatics . 2010,第1期

机译：PPLook：用于蛋白质-蛋白质相互作用的自动化数据挖掘工具
3. Text mining and visualisation of Protein-Protein Interactions. [J] . Tsai FS International journal of computational biology and drug design . 2011,第3期

机译：蛋白质和蛋白质相互作用的文本挖掘和可视化。
4. TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction [C] . Rania Ahmed Abdel Azzem, Seoud Abul 2nd International Conference on Computer Technology and Development . 2010

机译：TMAC：一种自动文本挖掘工具，用于构建带注释的语料库，以支持蛋白质-蛋白质相互作用信息的提取
5. Annotating a corpus of biomedical research texts: Two models of rhetorical analysis. [D] . White, Barbara Ellen. 2010

机译：注释生物医学研究文献集：修辞分析的两种模型。
6. GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics Informatics to Support Biomedical Information Extraction [O] . So-Yeon Oh, Ji-Hyeon Kim, Seo-Jin Kim, 2018

机译：GNI语料库版本1.0：带注释的基因组学和信息学全文语料库支持生物医学信息提取
7. Text-Mining Protein-Protein Interaction Corpus Using Concept Clustering to Identify Intermittency [O] . Leif E. Peterson, Matthew A. Coleman 2009

机译：使用概念聚类识别间歇性的文本挖掘蛋白质-蛋白质相互作用语料库

TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction

摘要

著录项

相似文献

相关主题

期刊订阅