首页> 外文会议>Workshop on biomedical natural language processing >Automated Disease Normalization with Low Rank Approximations
【24h】

Automated Disease Normalization with Low Rank Approximations

机译:具有低秩近似的自动化疾病归一化

获取原文
获取外文期刊封面目录资料

摘要

While machine learning methods for named entity recognition (mention-level detection) have become common, machine learning methods have rarely been applied to normalization (concept-level identification). Recent research introduced a machine learning method for normalization based on pairwise learning to rank. This method, DNorm, uses a linear model to score the similarity between mentions and concept names, and has several desirable properties, including learning term variation directly from training data. In this manuscript we employ a dimensionality reduction technique based on low-rank matrix approximation, similar to latent semantic indexing. We compare the performance of the low rank method to previous work, using disease name normalization in the NCBI Disease Corpus as the test case, and demonstrate increased performance as the matrix rank increases. We further demonstrate a significant reduction in the number of parameters to be learned and discuss the implications of this result in the context of algorithm scalability.
机译:虽然用于命名实体识别(提及级别的检测)的机器学习方法已经很普遍,但是机器学习方法却很少用于规范化(概念级别的识别)。最近的研究介绍了一种基于成对学习的机器学习方法进行归一化。这种方法DNorm使用线性模型对提及和概念名称之间的相似性进行评分,并具有一些理想的属性,包括直接从训练数据中学习术语变化。在本手稿中,我们采用了基于低秩矩阵近似的降维技术,类似于潜在的语义索引。我们使用NCBI疾病语料库中的疾病名称归一化作为测试用例,将低秩方法的性能与以前的工作进行了比较,并证明了随着矩阵秩的增加,性能得到了提高。我们进一步证明了要学习的参数数量的显着减少,并在算法可伸缩性的背景下讨论了该结果的含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号