...
首页> 外文期刊>BMC Bioinformatics >Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature
【24h】

Detecting experimental techniques and selecting relevant documents for protein-protein interactions from biomedical literature

机译:从生物医学文献中检测实验技术并选择蛋白质-蛋白质相互作用的相关文献

获取原文

摘要

BackgroundThe selection of relevant articles for curation, and linking those articles to experimental techniques confirming the findings became one of the primary subjects of the recent BioCreative III contest. The contest’s Protein-Protein Interaction (PPI) task consisted of two sub-tasks: Article Classification Task (ACT) and Interaction Method Task (IMT). ACT aimed to automatically select relevant documents for PPI curation, whereas the goal of IMT was to recognise the methods used in experiments for identifying the interactions in full-text articles.ResultsWe proposed and compared several classification-based methods for both tasks, employing rich contextual features as well as features extracted from external knowledge sources. For IMT, a new method that classifies pair-wise relations between every text phrase and candidate interaction method obtained promising results with an F1 score of 64.49%, as tested on the task’s development dataset. We also explored ways to combine this new approach and more conventional, multi-label document classification methods. For ACT, our classifiers exploited automatically detected named entities and other linguistic information. The evaluation results on the BioCreative III PPI test datasets showed that our systems were very competitive: one of our IMT methods yielded the best performance among all participants, as measured by F1 score, Matthew’s Correlation Coefficient and AUC iP/R; whereas for ACT, our best classifier was ranked second as measured by AUC iP/R, and also competitive according to other metrics.ConclusionsOur novel approach that converts the multi-class, multi-label classification problem to a binary classification problem showed much promise in IMT. Nevertheless, on the test dataset the best performance was achieved by taking the union of the output of this method and that of a multi-class, multi-label document classifier, which indicates that the two types of systems complement each other in terms of recall. For ACT, our system exploited a rich set of features and also obtained encouraging results. We examined the features with respect to their contributions to the classification results, and concluded that contextual words surrounding named entities, as well as the MeSH headings associated with the documents were among the main contributors to the performance.
机译:背景技术选择相关的策展文章,并将这些文章与实验技术相关联以确认研究结果,这成为最近的BioCreative III竞赛的主要主题之一。比赛的蛋白质-蛋白质相互作用(PPI)任务由两个子任务组成:文章分类任务(ACT)和相互作用方法任务(IMT)。 ACT旨在自动选择用于PPI策划的相关文档,而IMT的目标是识别用于识别全文文章中的交互作用的实验方法。结果我们提出并比较了几种基于分类的方法来完成这两项任务,并使用了丰富的上下文特征以及从外部知识源中提取的特征。对于IMT,一种新的方法可以对每个文本短语之间的成对关系进行分类,并采用候选交互方法,该方法在任务的开发数据集上进行了测试,其F1分数达到了64.49%,令人鼓舞。我们还探索了将这种新方法与更常规的多标签文档分类方法相结合的方法。对于ACT,我们的分类器利用自动检测到的命名实体和其他语言信息。在BioCreative III PPI测试数据集上的评估结果表明,我们的系统极具竞争力:以F1得分,Matthew的相关系数和AUC iP / R衡量,IMT方法之一在所有参与者中均表现出最佳表现;而对于ACT,我们的最佳分类器在AUC iP / R的测量中排名第二,并且在其他指标上也具有竞争力。结论我们将多类,多标签分类问题转换为二元分类问题的新颖方法显示出广阔的前景IMT。尽管如此,在测试数据集上,通过将这种方法的输出与多类,多标签文档分类器的输出结合起来可以实现最佳性能,这表明这两种类型的系统在召回方面是相辅相成的。对于ACT,我们的系统利用了丰富的功能集并获得了令人鼓舞的结果。我们检查了这些功能对分类结果的贡献,并得出结论,围绕命名实体的上下文单词以及与文档关联的MeSH标题是影响性能的主要因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号