首页> 外文会议>International conference on mining intelligence and knowledge exploration >Applying Latent Semantic Analysis to Optimize Second-order Co-occurrence Vectors for Semantic Relatedness Measurement
【24h】

Applying Latent Semantic Analysis to Optimize Second-order Co-occurrence Vectors for Semantic Relatedness Measurement

机译:应用潜在语义分析优化用于语义相关性测量的二阶共现向量

获取原文

摘要

Measures of semantic relatedness are largely applicable in intelligent tasks of NLP and Bioinformatics. By taking these automated measures into account, this paper attempts to improve Second-order Co-occurrence Vector semantic relatedness measure for more effective estimation of relatedness between two given concepts. Typically, this measure, after constructing concepts definitions (Glosses) from a thesaurus, considers the cosine of the angle between the concepts' gloss vectors as the degree of relatedness. Nonetheless, these computed gloss vectors of concepts are impure and rather large in size which would hinder the expected performance of the measure. By employing latent semantic analysis (LSA), we try to conduct some level of insignificant feature elimination to generate economic gloss vectors. Applying both approaches to the biomedical domain, using MEDLINE as corpus, UMLS as thesaurus, and reference standard of biomedical concept pairs manually rated for relatedness, we show LSA implementation enforces positive impact in terms of performance and efficiency.
机译:语义相关性的度量很大程度上适用于NLP和生物信息学的智能任务。通过考虑这些自动化措施,本文尝试改进二阶共现向量语义相关性度量,以更有效地估计两个给定概念之间的相关性。通常,在从同义词库构造概念定义(光泽度)之后,此度量将概念光泽度向量之间的角度的余弦视为关联度。尽管如此,这些计算出的概念光泽度向量是不纯净的,并且大小较大,这会阻碍该度量的预期性能。通过使用潜在语义分析(LSA),我们尝试进行某种程度的无关紧要的特征消除,以生成经济的光泽向量。将这两种方法应用于生物医学领域,以MEDLINE为语料库,以UMLS为同义词库,以及对生物医学概念对进行手动相关性评估的参考标准,我们证明LSA实施对性能和效率产生了积极影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号