...
首页> 外文期刊>BMC Bioinformatics >Identifying gene and protein mentions in text using conditional random fields
【24h】

Identifying gene and protein mentions in text using conditional random fields

机译:使用条件随机字段识别文本中的基因和蛋白质提及

获取原文

摘要

Background We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P ( t | o ) of a tag sequence given an observation sequence directly, and have previously been employed successfully for other tagging tasks. The mechanics of CRFs and their relationship to maximum entropy are discussed in detail. Results We employ a diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons to achieve a precision of 86.4% and recall of 78.7%. An analysis of the contribution of the various features of the model is provided.
机译:背景我们提供了一种使用条件随机场(CRF)的概率序列标记框架标记文本中的基因和蛋白质的模型。有条件的随机字段直接为给定观察序列的标签序列的概率P(t | o)建模,并且先前已成功地用于其他标签任务。详细讨论了CRF的机理及其与最大熵的关系。结果我们采用了包含标准正字法特征和专家特征(以基因和生物学术语词典的形式)相结合的多样化特征集,以达到86.4%的精确度和78.7%的召回率。提供了对模型各种功能的贡献的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号