首页> 外文会议>Evolutionary computation, machine learning and data mining in bioinformatics >An Automatic Identification and Resolution System for Protein-Related Abbreviations in Scientific Papers
【24h】

An Automatic Identification and Resolution System for Protein-Related Abbreviations in Scientific Papers

机译:科技论文中蛋白质相关缩写的自动识别和解析系统

获取原文
获取原文并翻译 | 示例

摘要

We propose a methodology to identify and resolve protein-related abbreviations found in the full texts of scientific papers, as part of a semi-automatic process implemented in our PRAISED framework. The identification of biological acronyms is carried out via an effective syntactical approach, by taking advantage of lexical clues and using mostly domain-independent metrics, resulting in considerably high levels of recall as well as extremely low execution time. The subsequent abbreviation resolution uses both syntactical and semantic criteria in order to match an abbreviation with its potential explanation, as discovered among a number of contiguous words proportional to the abbreviation's length. We have tested our system against the Medstract Gold Standard corpus and a relevant set of manually annotated PubMed papers, obtaining significant results and high performance levels, while at the same time allowing for great customization, lightness and scalability.
机译:我们提出了一种方法,可以识别和解决科学论文全文中发现的与蛋白质相关的缩写,这是在PRAISED框架中实施的半自动过程的一部分。通过使用词法线索并使用与域无关的度量标准,可以通过有效的语法方法来识别生物首字母缩写词,从而导致较高的召回率和极短的执行时间。随后的缩写解析使用语法和语义标准,以使缩写与其可能的解释相匹配,这是在与缩写长度成比例的多个连续单词中发现的。我们已经根据Medstract Gold Standard语料库和一组相关的手动注释PubMed论文对我们的系统进行了测试,获得了显着的结果和高性能,同时还实现了出色的自定义性,轻便性和可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号