【24h】

A Tale of Four Metrics

机译:四个指标的故事

获取原文

摘要

There are many contexts where the definition of similarity in multivariate space requires to be based on the correlation, rather than absolute value, of the variables. Examples include classic IR measurements such as TDF/IF and BM25, client similarity measures based on collaborative filtering, feature analysis of chemical molecules, and biodiversity contexts. In such cases, it is almost standard for Cosine similarity to be used. More recently, Jensen-Shannon divergence has appeared in a proper metric form, and a related metric Structural Entropic Distance (SED) has been investigated. A fourth metric, based on a little-known divergence function named as Triangular Divergence, is also assessed here. For these metrics, we study their properties in the context of similarity and metric search. We compare and contrast their semantics and performance. Our conclusion is that, despite Cosine Distance being an almost automatic choice in this context, Triangular Distance is most likely to be the best choice in terms of a compromise between semantics and performance.
机译:在许多情况下,多元空间中相似性的定义需要基于变量的相关性而不是绝对值。示例包括经典的IR测量(例如TDF / IF和BM25),基于协作过滤的客户相似性测量,化学分子的特征分析以及生物多样性环境。在这种情况下,使用余弦相似度几乎是标准的。最近,詹森-香农散度以适当的度量形式出现,并且研究了相关的度量“结构熵距离”(SED)。在此还评估了基于鲜为人知的发散函数Triangular Divergence的第四个度量。对于这些指标,我们在相似性和指标搜索的背景下研究它们的属性。我们比较并对比了它们的语义和性能。我们的结论是,尽管在这种情况下余弦距离几乎是自动选择,但就语义和性能之间的折衷而言,三角距离最有可能是最佳选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号