...
首页> 外文期刊>Journal of biomedical informatics. >Automatically identifying gene/protein terms in MEDLINE abstracts.
【24h】

Automatically identifying gene/protein terms in MEDLINE abstracts.

机译:自动识别MEDLINE摘要中的基因/蛋白质术语。

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

MOTIVATION: Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation. RESULTS: GPmarkup has 73% recall and 93% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. AVAILABILITY: A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52@columbia.edu. Voice: 212-939-7028; fax: 212-666-0140.
机译:动机:自然语言处理(NLP)技术用于从计算机可读文献中自动提取信息。在生物学中,识别与生物物质(例如,基因和蛋白质)相对应的术语是在应用其他提取生物学信息(例如,蛋白质-蛋白质相互作用,基因调控事件和生化途径)的NLP系统之前的必要步骤。 。我们已经开发了GPmarkup(用于“基因/蛋白质全名标记”),该软件系统可以自动识别MEDLINE摘要中的基因/蛋白质术语(即符号或全名)。作为标记过程的一部分,我们还自动从MEDLINE中生成了成对的基因/蛋白质符号和全名(例如,用于淋巴细胞相关死亡受体的LARD)的知识来源。我们发现,知识源中的许多对均未出现在当前的GenBank数据库中。因此,我们的方法也可以用于自动词典生成。结果:GPmarkup在MEDLINE摘要中识别和标记基因/蛋白质术语时具有73%的召回率和93%的精度。可用性:可以在http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/上查看基因/蛋白质符号和全名的随机样本以及带有标记的摘要的样本集。联系。 hy52@columbia.edu。语音:212-939-7028;传真:212-666-0140。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号