首页> 外文期刊>Bioinformatics >Learning string similarity measures for gene/protein name dictionary look-up using logistic regression
【24h】

Learning string similarity measures for gene/protein name dictionary look-up using logistic regression

机译:使用Logistic回归学习用于基因/蛋白质名称词典查找的字符串相似性度量

获取原文
获取原文并翻译 | 示例
       

摘要

Motivation: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number in the database, due to seemingly small differences of names. Soft string matching potentially enables us to find the relevant ID by considering the similarity between the names. However, the accuracy of soft matching highly depends on the similarity measure employed. Results: We used logistic regression for learning a string similarity measure from a dictionary. Experiments using several large-scale gene/protein name dictionaries showed that the logistic regression-based similarity measure outperforms existing similarity measures in dictionary look-up tasks.
机译:动机:生物医学数据集成的瓶颈之一是术语的变化。精确的字符串匹配通常由于名称之间的细微差异而无法将名称与其生物学概念相关联,即数据库中的ID或登录号。软字符串匹配可能使我们能够通过考虑名称之间的相似性来找到相关的ID。但是,软匹配的准确性很大程度上取决于所采用的相似性度量。结果:我们使用逻辑回归从字典中学习字符串相似性度量。使用多个大型基因/蛋白质名称词典的实验表明,在字典查找任务中,基于逻辑回归的相似性度量优于现有的相似性度量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号