首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Towards semi-automated curation: using text mining to recreate the HIV-1 human protein interaction database
【2h】

Towards semi-automated curation: using text mining to recreate the HIV-1 human protein interaction database

机译:走向半自动化管理:使用文本挖掘来重新创建HIV-1人类蛋白质相互作用数据库

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Manual curation has long been used for extracting key information found within the primary literature for input into biological databases. The human immunodeficiency virus type 1 (HIV-1), human protein interaction database (HHPID), for example, contains 2589 manually extracted interactions, linked to 14 312 mentions in 3090 articles. The advancement of text-mining (TM) techniques has offered a possibility to rapidly retrieve such data from large volumes of text to a high degree of accuracy. Here, we present a recreation of the HHPID using the current state of the art in TM. To retrieve interactions, we performed gene/protein named entity recognition (NER) and applied two molecular event extraction tools on all abstracts and titles cited in the HHPID. Our best NER scores for precision, recall and F-score were 87.5%, 90.0% and 88.6%, respectively, while event extraction achieved 76.4%, 84.2% and 80.1%, respectively. We demonstrate that over 50% of the HHPID interactions can be recreated from abstracts and titles. Furthermore, from 49 available open-access full-text articles, we extracted a total of 237 unique HIV-1–human interactions, as opposed to 187 interactions recorded in the HHPID from the same articles. On average, we extracted 23 times more mentions of interactions and events from a full-text article than from an abstract and title, with a 6-fold increase in the number of unique interactions. We further demonstrated that more frequently occurring interactions extracted by TM are more likely to be true positives. Overall, the results demonstrate that TM was able to recover a large proportion of interactions, many of which were found within the HHPID, making TM a useful assistant in the manual curation process. Finally, we also retrieved other types of interactions in the context of HIV-1 that are not currently present in the HHPID, thus, expanding the scope of this data set. All data is available at http://gnode1.mib.man.ac.uk/HIV1-text-mining.
机译:长期以来,手动策展一直用于提取主要文献中的关键信息,以输入到生物学数据库中。例如,人类免疫缺陷病毒1型(HIV-1),人类蛋白质相互作用数据库(HHPID)包含2589个手动提取的相互作用,与3090篇文章中提到的14→312相关。文本挖掘(TM)技术的进步提供了一种可能性,可以从大量文本中快速高精度地检索此类数据。在这里,我们使用TM的最新技术介绍HHPID。为了检索相互作用,我们进行了基因/蛋白质命名实体识别(NER),并对HHPID中引用的所有摘要和标题应用了两种分子事件提取工具。我们在准确度,召回率和F分数上的最佳NER得分分别为87.5%,90.0%和88.6%,而事件提取分别达到76.4%,84.2%和80.1%。我们证明,可以从摘要和标题中重新创建超过50%的HHPID交互。此外,我们从49篇开放获取的全文文章中,总共提取了237种HIV-1与人的独特互动,而同一篇文章中HHPID中记录的187种互动。平均而言,与全文和摘要相比,全文文章中提及的互动和事件的提及要多23倍,独特互动的数量增加了6倍。我们进一步证明,TM提取的更频繁发生的相互作用更有可能是真正的积极因素。总体而言,结果表明TM能够恢复很大比例的交互作用,其中许多交互作用是在HHPID中发现的,从而使TM成为手动管理过程中的有用助手。最后,我们还检索了HHPID中当前不存在的HIV-1背景下的其他类型的相互作用,从而扩大了该数据集的范围。所有数据都可以在http://gnode1.mib.man.ac.uk/HIV1-text-mining上获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号