首页> 外文会议>International Conference on Computer Technology and Development >TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction
【24h】

TMAC: An automated text mining tool for construction of an annotated corpus to support protein-protein interaction information extraction

机译:TMAC:一种用于构建带注释的语料的自动化挖掘工具,以支持蛋白质 - 蛋白质相互作用信息提取

获取原文

摘要

Extracting protein-protein interaction (PPI) from biomedical literatures is a meaningful topic in protein science. Annotated corpora are important to the development and evaluation of protein-protein interaction extraction systems. So it is important to construct a text mining tool for the annotation of any corpus for protein name and interaction events for the identification of interactions among proteins. In this paper we present a java package called the TMAC system. TMAC tagged protein names and interaction events in biomedical literatures based on a combination of carefully designed rules and a dictionary of protein names. TMAC is able to normalize the results of protein mentions and interaction events found by offering the appropriate database reference. TMAC is divided into two modules. The first module is the Name entity identification and normalization module. The second module is the interaction event tagger for the identification of words that will ensure the occurrence of the interaction. TMAC achieved an average of 85.2% precision, 76.7% recall for the protein identification process. TMAC achieved an average of 88.2% precision, 71.8% recall for the protein - protein interaction event identification process. TMAC is a flexible system. It could be used as a standalone application or can be incorporated in the workflow of a more general text mining system.
机译:从生物医学文献中提取蛋白质 - 蛋白质相互作用(PPI)是蛋白质科学中有意义的课题。注释的Corpora对蛋白质 - 蛋白质相互作用提取系统的开发和评估非常重要。因此,构建文本挖掘工具,用于鉴定蛋白质中的相互作用的蛋白质名称和相互作用事件的任何语料库。在本文中,我们介绍了一个名为TMAC系统的Java包。基于精心设计的规则和蛋白质名称词典的组合,生物医学文献中的TMAC标记蛋白质名称和交互事件。 TMAC能够通过提供适当的数据库参考来规范蛋白质提及和互动事件的结果。 TMAC分为两个模块。第一个模块是名称实体识别和归一化模块。第二个模块是交互事件标记器,用于识别将确保互动发生的单词。 TMAC平均达到85.2%的精度,76.7%召回蛋白质鉴定过程。 TMAC平均达到88.2%的精度,71.8%召回蛋白质 - 蛋白质相互作用事件识别过程。 TMAC是一个灵活的系统。它可以用作独立应用程序,或者可以包含在更一般的文本挖掘系统的工作流程中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号