...
首页> 外文期刊>BMC Bioinformatics >A graph-search framework for associating gene identifiers with documents
【24h】

A graph-search framework for associating gene identifiers with documents

机译:一个图形搜索框架,用于将基因标识符与文档相关联

获取原文

摘要

Background One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation of this problem suitable for semi-automated systems, in which each article is associated with a ranked list of possible gene identifiers, and experimentally compare methods for solving this geneId ranking problem. In addition to baseline approaches based on combining named entity recognition (NER) systems with a "soft dictionary" of gene synonyms, we evaluate a graph-based method which combines the outputs of multiple NER systems, as well as other sources of information, and a learning method for reranking the output of the graph-based method. Results We show that named entity recognition (NER) systems with similar F-measure performance can have significantly different performance when used with a soft dictionary for geneId-ranking. The graph-based approach can outperform any of its component NER systems, even without learning, and learning can further improve the performance of the graph-based ranking approach. Conclusion The utility of a named entity recognition (NER) system for geneId-finding may not be accurately predicted by its entity-level F1 performance, the most common performance measure. GeneId-ranking systems are best implemented by combining several NER systems. With appropriate combination methods, usefully accurate geneId-ranking systems can be constructed based on easily-available resources, without resorting to problem-specific, engineered components.
机译:背景技术模型生物体数据库策策过程中的一步是为每篇文章找到文章中讨论的每个基因的标识符。我们考虑放松该问题,适用于半自动系统,其中每种物品与可能的基因标识符的排名列表相关,并通过实验比较求解该基因排名问题的方法。除了基于基因组合的基线方法,除了基因同义词的“软词典”的命名实体识别(NER)系统,我们评估了基于图形的方法,该方法组合了多个NER系统的输出,以及其他信息来源,以及一种rerank基于图形方法的输出的学习方法。结果我们表明,当与基因分词一起使用时,具有类似F测量性能的命名实体识别(NER)系统可以具有显着不同的性能。基于图形的方法可以胜过其组件内部系统,即使没有学习,也可以进一步提高基于图形的排名方法的性能。结论目的识别(NER)系统的效用可能无法通过其实体级F1性能准确预测,最常见的性能测量。通过组合几个NER系统,最好地实现基因排名系统。通过适当的组合方法,可以基于易于使用的资源来构建有用的准确基因排名系统,而无需借助特定于问题的工程组件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号