首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets
【24h】

Learning a Distance Metric by Balancing KL-Divergence for Imbalanced Datasets

机译:通过平衡不平衡数据集的KL散度来学习距离度量

获取原文
获取原文并翻译 | 示例
           

摘要

In many real-world domains, datasets with imbalanced class distributions occur frequently, which may confuse various machine learning tasks. Among all these tasks, learning classifiers from imbalanced datasets is an important topic. To perform this task well, it is crucial to train a distance metric which can accurately measure similarities between samples from imbalanced datasets. Unfortunately, existing distance metric methods, such as large margin nearest neighbor, information-theoretic metric learning, etc., care more about distances between samples and fail to take imbalanced class distributions into consideration. Traditional distance metrics have natural tendencies to favor the majority classes, which can more easily satisfy their objective function. Those important minority classes are always neglected during the construction process of distance metrics, which severely affects the decision system of most classifiers. Therefore, how to learn an appropriate distance metric which can deal with imbalanced datasets is of vital importance, but challenging. In order to solve this problem, this paper proposes a novel distance metric learning method named distance metric by balancing KL-divergence (DMBK). DMBK defines normalized divergences using KL-divergence to describe distinctions between different classes. Then it combines geometric mean with normalized divergences and separates samples from different classes simultaneously. This procedure separates all classes in a balanced way and avoids inaccurate similarities incurred by imbalanced class distributions. Various experiments on imbalanced datasets have verified the excellent performance of our novel method.
机译:在许多现实世界中,类分布不平衡的数据集经常出现,这可能会使各种机器学习任务感到困惑。在所有这些任务中,从不平衡数据集中学习分类器是一个重要主题。为了很好地执行此任务,训练距离度量可以精确地测量不平衡数据集中样本之间的相似性至关重要。不幸的是,现有的距离度量方法,例如大边距最近邻居,信息理论度量学习等,更加关注样本之间的距离,并且没有考虑不平衡的类分布。传统的距离度量具有自然倾向来支持多数类别,从而可以更轻松地满足其目标功能。在距离度量的构建过程中,始终会忽略那些重要的少数类,这严重影响了大多数分类器的决策系统。因此,如何学习可以处理不平衡数据集的适当距离度量至关重要,但具有挑战性。为了解决这个问题,本文提出了一种通过平衡KL散度(DMBK)的距离度量学习方法。 DMBK使用KL散度定义归一化散度,以描述不同类之间的区别。然后将几何均值与归一化发散相结合,并同时将不同类别的样本分离。此过程以平衡的方式分离所有类,并避免了不平衡的类分布引起的不精确相似性。在不平衡数据集上进行的各种实验证明了我们新方法的出色性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号