首页> 外文会议>International Conference on Similarity Search and Applications >Vector-Based Similarity Measurements for Historical Figures
【24h】

Vector-Based Similarity Measurements for Historical Figures

机译:历史数字的矢量相似度测量

获取原文

摘要

Historical interpretation benefits from identifying analogies among famous people: Who are the Lincolns, Einsteins, Hitlers, and Mozarts? We investigate several approaches to convert approximately 600,000 historical figures into vector representations to quantify similarity according to their Wikipedia pages. We adopt an effective reference standard based on the number of human-annotated Wikipedia categories being shared and use this to demonstrate the performance of our similarity detection algorithms. In particular, we investigate four different unsupervised approaches to representing the semantic associations of individuals: (1) TF-IDF, (2) Weighted average of distributed word embedding, (3) LDA Topic analysis and (4) Deepwalk embedding from page links. All proved effective, but Deepwalk embedding yielded an overall accuracy of 91.33% in our evaluation to uncover historical analogies. Combining LDA and Deepwalk yielded even higher performance.
机译:历史解释从着名人士中识别类比:林肯,艾因斯,干艇和莫扎尔斯是谁?我们调查了几种方法,将大约60万历史数字转换为矢量表示,以根据其维基百科页面量化相似性。我们采用有效的参考标准,基于正在共享的人为批注的维基百科类别的数量,并使用它来证明我们的相似性检测算法的性能。特别是,我们调查了四种不同的无监督方法来代表个人的语义关联:(1)TF-IDF,(2)分布式单词嵌入的加权平均值,(3)LDA主题分析和(4)从页面链接嵌入了(4)DeepWalk。所有证明有效,但深度嵌入在我们的评估中产生了91.33%的整体准确性,以发现历史类比。结合LDA和Deepwalk产生了更高的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号