首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts
【2h】

Preliminary evaluation of the CellFinder literature curation pipeline for gene expression in kidney cells and anatomical parts

机译:初步评估CellFinder文献管理管道在肾细胞和解剖部位表达基因的能力

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Biomedical literature curation is the process of automatically and/or manually deriving knowledge from scientific publications and recording it into specialized databases for structured delivery to users. It is a slow, error-prone, complex, costly and, yet, highly important task. Previous experiences have proven that text mining can assist in its many phases, especially, in triage of relevant documents and extraction of named entities and biological events. Here, we present the curation pipeline of the CellFinder database, a repository of cell research, which includes data derived from literature curation and microarrays to identify cell types, cell lines, organs and so forth, and especially patterns in gene expression. The curation pipeline is based on freely available tools in all text mining steps, as well as the manual validation of extracted data. Preliminary results are presented for a data set of 2376 full texts from which >4500 gene expression events in cell or anatomical part have been extracted. Validation of half of this data resulted in a precision of ∼50% of the extracted data, which indicates that we are on the right track with our pipeline for the proposed task. However, evaluation of the methods shows that there is still room for improvement in the named-entity recognition and that a larger and more robust corpus is needed to achieve a better performance for event extraction.>Database URL:
机译:生物医学文献管理是从科学出版物中自动和/或手动获取知识,并将其记录到专门的数据库中以进行结构化交付给用户的过程。这是一个缓慢,容易出错,复杂,昂贵且非常重要的任务。以前的经验证明,文本挖掘可以在其多个阶段提供帮助,尤其是在分类相关文件以及提取命名实体和生物事件方面。在这里,我们介绍了CellFinder数据库的策展流程,CellFinder数据库是细胞研究的资料库,其中包括从文学策展和微阵列中获得的数据,以识别细胞类型,细胞系,器官等,尤其是基因表达方式。策展渠道基于所有文本挖掘步骤中免费提供的工具以及提取数据的手动验证。初步结果针对2376篇全文的数据集,从中提取了4500多个细胞或解剖部分的基因表达事件。验证一半的数据可以使提取数据的精度达到〜50%,这表明我们在拟议任务的开发流程中处于正确的轨道。但是,对该方法的评估表明,命名实体识别仍有改进的空间,并且需要更大且更健壮的语料库才能实现更好的事件提取性能。>数据库URL

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号