首页> 外文期刊>Machine Learning >Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms
【24h】

Lowest probability mass neighbour algorithms: relaxing the metric constraint in distance-based neighbourhood algorithms

机译:最低概率质量邻域算法:放宽基于距离的邻域算法中的度量约束

获取原文
获取原文并翻译 | 示例
           

摘要

The use of distance metrics such as the Euclidean or Manhattan distance for nearest neighbour algorithms allows for interpretation as a geometric model, and it has been widely assumed that the metric axioms are a necessary condition for many data mining tasks. We show that this assumption can in fact be an impediment to producing effective models. We propose to use mass-based dissimilarity, which employs estimates of the probability mass to measure dissimilarity, to replace the distance metric. This substitution effectively converts nearest neighbour (NN) algorithms into lowest probability mass neighbour (LMN) algorithms. Both types of algorithms employ exactly the same algorithmic procedures, except for the substitution of the dissimilarity measure. We show that LMN algorithms overcome key shortcomings of NN algorithms in classification and clustering tasks. Unlike existing generalised data independent metrics (e.g., quasi-metric, meta-metric, semi-metric, peri-metric) and data dependent metrics, the proposed mass-based dissimilarity is unique because its self-dissimilarity is data dependent and non-constant.
机译:距离度量(例如欧几里得距离或曼哈顿距离)用于最近邻算法可将其解释为几何模型,并且已广泛假定度量公理是许多数据挖掘任务的必要条件。我们表明,这种假设实际上可能会阻碍产生有效的模型。我们建议使用基于质量的不相似性,该方法采用对概率质量的估计来测量不相似性,以取代距离度量。该替换有效地将最近邻居(NN)算法转换为最低概率质量邻居(LMN)算法。两种类型的算法都采用完全相同的算法过程,不同之处在于替换了相异性度量。我们表明,LMN算法克服了神经网络算法在分类和聚类任务中的关键缺点。与现有的广义数据无关度量(例如,准度量,元度量,半度量,周度度量)和数据相关度量不同,建议的基于质量的差异是唯一的,因为其自相似性是数据相关且非恒定的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号