首页> 外文会议>International Conference Mining Intelligence and Knowledge Exploration >Applying Latent Semantic Analysis to Optimize Second-order Co-occurrence Vectors for Semantic Relatedness Measurement
【24h】

Applying Latent Semantic Analysis to Optimize Second-order Co-occurrence Vectors for Semantic Relatedness Measurement

机译:应用潜在语义分析优化用于语义相关性测量的二阶共生载体

获取原文

摘要

Measures of semantic relatedness are largely applicable in intelligent tasks of NLP and Bioinformatics. By taking these automated measures into account, this paper attempts to improve Second-order Co-occurrence Vector semantic relatedness measure for more effective estimation of relatedness between two given concepts. Typically, this measure, after constructing concepts definitions (Glosses) from a thesaurus, considers the cosine of the angle between the concepts' gloss vectors as the degree of relatedness. Nonetheless, these computed gloss vectors of concepts are impure and rather large in size which would hinder the expected performance of the measure. By employing latent semantic analysis (LSA), we try to conduct some level of insignificant feature elimination to generate economic gloss vectors. Applying both approaches to the biomedical domain, using MEDLINE as corpus, UMLS as thesaurus, and reference standard of biomedical concept-pairs manually rated for relatedness, we show LSA implementation enforces positive impact in terms of performance and efficiency.
机译:语义相关性的测量主要适用于NLP和生物信息学的智能任务。通过考虑这些自动化措施,本文试图改善二阶共生,以便更有效地估算两个给定概念之间的相关性。通常,在构建来自词库的概念定义(彩色界面)之后,这一措施将概念的光泽矢量之间的角度视为相关性的余弦。尽管如此,这些计算的概念的光泽载体缺乏且相当大的尺寸,这会阻碍测量的预期性能。通过采用潜在语义分析(LSA),我们试图进行一些微不足道的功能消除,以产生经济光泽矢量。将两种方法应用于生物医学域,使用Medline作为语料库,UMLS作为叙词子,以及手动额定相关性的生物医学概念对的参考标准,我们展示了LSA实施在性能和效率方面实施了积极影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号