首页> 外文会议>International Conference on Bioinformatics and Biomedical Engineering >A Comprehensive Comparison of Two MEDLINE Annotators for Disease and Gene Linkage: Sometimes Less is More
【24h】

A Comprehensive Comparison of Two MEDLINE Annotators for Disease and Gene Linkage: Sometimes Less is More

机译:两种用于疾病和基因连锁的MEDLINE注释器的全面比较:有时少即是多

获取原文

摘要

Text mining is popular in biomedical applications because it allows retrieving highly relevant information. Particularly for us, it is quite practical in linking diseases to the genes involved in them. However text mining involves multiple challenges, such as (1) recognizing named entities (e.g., diseases and genes) inside the text, (2) constructing specific vocabularies that efficiently represent the available text, and (3) applying the correct statistical criteria to link biomedical entities with each other. We have previously developed Beegle, a tool that allows prioritizing genes for any search query of interest. The method starts with a search phase, where relevant genes are identified via the literature. Once known genes are identified, a second phase allows prioritizing novel candidate genes through a data fusion strategy. Many aspects of our method could be potentially improved. Here we evaluate two MEDLINE annotators that recognize biomedical entities inside a given abstract using different dictionaries and annotation strategies. We compare the contribution of each of the two annotators in associating genes with diseases under different vocabulary settings. Somewhat surprisingly, with fewer recognized entities and a more compact vocabulary, we obtain better associations between genes and diseases. We also propose a novel but simple association criterion to link genes with diseases, which relies on recognizing only gene entities inside the biomedical text. These refinements significantly improve the performance of our method.
机译:文本挖掘在生物医学应用中很受欢迎,因为它可以检索高度相关的信息。特别是对我们而言,将疾病与涉及疾病的基因联系起来非常实用。但是,文本挖掘涉及多个挑战,例如(1)识别文本内部的命名实体(例如疾病和基因),(2)构建有效表示可用文本的特定词汇表,以及(3)应用正确的统计标准进行链接生物医学实体彼此之间。我们之前已经开发了Beegle,该工具可让您为感兴趣的任何搜索查询确定基因的优先级。该方法从搜索阶段开始,通过文献鉴定相关基因。一旦识别出已知基因,第二阶段就可以通过数据融合策略确定新候选基因的优先级。我们方法的许多方面都可能得到改进。在这里,我们评估了两个MEDLINE注释器,它们使用不同的字典和注释策略识别给定摘要内的生物医学实体。我们比较了两种注释器在将基因与不同词汇环境下的疾病相关联时的贡献。出乎意料的是,通过更少的可识别实体和更紧凑的词汇表,我们获得了基因与疾病之间更好的关联。我们还提出了一种新颖而简单的关联标准来将基因与疾病联系起来,这仅依赖于识别生物医学文献中的基因实体。这些改进极大地提高了我们方法的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号