首页> 外文会议>Natural language processing and information systems >An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval
【24h】

An Experimental Comparison of Explicit Semantic Analysis Implementations for Cross-Language Retrieval

机译:跨语言检索的显式语义分析实现的实验比较

获取原文
获取原文并翻译 | 示例

摘要

Explicit Semantic Analysis (ESA) has been recently proposed as an approach to computing semantic relatedness between words (and indirectly also between texts) and has thus a natural application in information retrieval, showing the potential to alleviate the vocabulary mismatch problem inherent in standard Bag-of-Word models. The ESA model has been also recently extended to cross-lingual retrieval settings, which can be considered as an extreme case of the vocabulary mismatch problem. The ESA approach actually represents a class of approaches and allows for various instantiations. As our first contribution, we generalize ESA in order to clearly show the degrees of freedom it provides. Second, we propose some variants of ESA along different dimensions, testing their impact on performance on a cross-lingual mate retrieval task on two datasets (JRC-ACQUIS and Multext). Our results are interesting as a systematic investigation has been missing so far and the variations between different basic design choices are significant. We also show that the settings adopted in the original ESA implementation are reasonably good, which to our knowledge has not been demonstrated so far, but can still be significantly improved by tuning the right parameters (yielding a relative improvement on a cross-lingual mate retrieval task of between 62% (Multext) and 237% (JRC-ACQUIS) with respect to the original ESA model).
机译:最近提出了显式语义分析(Explicit Semantic Analysis,ESA)作为一种计算单词之间(以及间接地也包括文本之间)语义相关性的方法,因此已在信息检索中得到了自然应用,显示出缓解标准Bag-固有的词汇不匹配问题的潜力。字词模型。 ESA模型最近还扩展到了跨语言检索设置,可以将其视为词汇失配问题的极端情况。 ESA方法实际上代表了一类方法,并允许进行各种实例化。作为我们的第一项贡献,我们对ESA进行了概括,以清楚地表明其提供的自由度。其次,我们提出了沿不同维度的ESA变体,测试了它们对两个数据集(JRC-ACQUIS和Multext)的跨语言伴侣检索任务的性能影响。我们的结果很有趣,因为到目前为止尚未进行系统的调查,而且不同基本设计选择之间的差异也很大。我们还表明,原始ESA实施中采用的设置是相当不错的,据我们所知,到目前为止尚未证明,但仍可以通过调整正确的参数(在跨语言伴侣检索中进行相对改进)进行显着改进。相对于原始ESA模型而言,任务介于62%(Multext)和237%(JRC-ACQUIS)之间)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号