首页> 外文会议>International coference on advanced intelligent systems and informatics >Fast, Accurate, Multilingual Semantic Relatedness Measurement Using Wikipedia Links
【24h】

Fast, Accurate, Multilingual Semantic Relatedness Measurement Using Wikipedia Links

机译:使用维基百科链接快速,准确,多语言的语义相关性测量

获取原文

摘要

In this chapter we present a fast, accurate, and elegant metric to assess semantic relatedness among entities included in an hypertextual corpus building an novel language independent Vector Space Model. Such a technique is based upon the Jaccard similarity coefficient, approximated with the MinHash technique to generate a constant-size vector fingerprint for each entity in the considered corpus. This strategy allows evaluation of pairwise semantic relatedness in constant time, no matter how many entities are included in the data and how dense the internal link structure is. Being semantic relatedness a subtle and somewhat subjective matter, we evaluated our approach by running user tests on a crowdsourcing platform. To achieve a better evaluation we considered two collaboratively built corpora: the English Wikipedia and the Italian Wikipedia, which differ significantly in size, topology, and user base. The evaluation suggests that the proposed technique is able to generate satisfactory results, outperforming commercial baseline systems regardless of the employed data and the cultural differences of the considered test users.
机译:在本章中,我们提出包含在超文本的语料库实体建立一个新的语言无关的向量空间模型中快速,准确,优雅的指标来评估语义关联。这种技术是基于Jaccard相似系数,与所述最小哈希技术近似为产生用于在所考虑的语料库每个实体的恒定大小的矢量指纹。这种策略使得成对语义关联的评估在固定时间内,无论多少实体如何纳入数据,以及如何在密集的内部链接结构。作为语义关联微妙,有点主观的问题,我们通过众包平台上运行用户测试评估我们的做法。为了达到更好的评估中,我们考虑了两种协作建立语料库:在英文维基百科和维基百科的意大利,其规模,拓扑和用户群显著不同。评估表明,所提出的技术能够产生令人满意的结果,跑赢基准商用系统无论是就业数据和所考虑的测试用户的文化差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号