首页> 外文期刊>Bioinformatics >A dictionary to identify small molecules and drugs in free text
【24h】

A dictionary to identify small molecules and drugs in free text

机译:识别自由文本中的小分子和药物的字典

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: From the scientific community, a lot of effort has been spent on the correct identification of gene and protein names in text, while less effort has been spent on the correct identification of chemical names. Dictionary-based term identification has the power to recognize the diverse representation of chemical information in the literature and map the chemicals to their database identifiers.Results: We developed a dictionary for the identification of small molecules and drugs in text, combining information from UMLS, MeSH, ChEBI, DrugBank, KEGG, HMDB and ChemIDplus. Rule-based term filtering, manual check of highly frequent terms and disambiguation rules were applied. We tested the combined dictionary and the dictionaries derived from the individual resources on an annotated corpus, and conclude the following: (i) each of the different processing steps increase precision with a minor loss of recall; (ii) the overall performance of the combined dictionary is acceptable (precision 0.67, recall 0.40 (0.80 for trivial names); (iii) the combined dictionary performed better than the dictionary in the chemical recognizer OSCAR3; (iv) the performance of a dictionary based on ChemIDplus alone is comparable to the performance of the combined dictionary.
机译:动机:在科学界,人们花费了大量精力来正确识别文本中的基因和蛋白质名称,而花费更少的精力来正确识别化学名称。基于字典的术语识别功能可以识别文献中化学信息的各种表示形式,并将化学信息映射到其数据库标识符。结果:我们结合了UMLS的信息,开发了一种字典来识别文本中的小分子和药物, MeSH,ChEBI,DrugBank,KEGG,HMDB和ChemIDplus。应用基于规则的术语过滤,手动检查频繁出现的术语和消除歧义的规则。我们在带注释的语料库上测试了组合字典和从单个资源派生的字典,并得出以下结论:(i)每个不同的处理步骤都提高了精度,而召回损失很小; (ii)组合字典的整体性能是可以接受的(精度为0.67,召回率为0.40(琐碎名称为0.80);(iii)组合字典的性能优于化学识别器OSCAR3中的字典;(iv)字典的性能仅基于ChemIDplus的字典就可以与组合字典的表现相媲美。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号