首页> 中文期刊> 《计算机应用 》 >基于KL散度和近邻点间距离的球面嵌入算法

基于KL散度和近邻点间距离的球面嵌入算法

             

摘要

Aiming at the problem that the existing spherical embedding algorithm cannot effectively embed the data into the low-dimensional space in the case that the distances between points far apart are inaccurate or absent,a new spherical embedding method was proposed,which can take the distances between the nearest neighbor points as input,and embeds high dimensional data of any scale onto the unit sphere,and then estimates the radius of the sphere which fit the distribution of the original data.Starting from a randomly generated spherical distribution,the Kullback-Leibler (KL) divergence was used to measure the difference of the normalized distance between each pair of neighboring points in the original space and the spherical space.Based on the difference,the objective function was constructed.Then,the stochastic gradient descent method with momentum was used to optimize the distribution of the points on the sphere until the result is stable.To test the algorithm,two types of spherical distribution data sets were simulated:which are spherical uniform distribution and Kent distribution on the unit sphere.The experimental results show that,for the uniformly distributed data,the data can be accurately embedded in the spherical space even if the number of neighbors is very small,the Root Mean Square Error (RMSE) of the embedded data distribution and the original data distribution is less than 0.00001,and the spherical radius of the estimated error is less than 0.000001;for spherical normal distribution data,the data can be embedded into the spherical space accurately when the number of neighbors is large.Therefore,in the case that the distance between points far apart are absent,the proposed method can still be quite accurate for low-dimensional data embedding,which is very helpful for the visualization of data.%针对现有球面嵌入算法在非近邻点间的距离度量不准确或缺失的情况下,不能有效地进行低维嵌入的问题,提出了一种新的球面嵌入算法,它能够只利用近邻点间的距离,将任何尺度的高维数据嵌入到单位球面上,同时求出适合原始数据分布的球面半径.该算法从一个随机产生的球面分布开始,利用KL散度衡量每对近邻点间的归一化距离在原始空间和球面空间中的差异,并基于此差异构建出目标函数,然后再用带有动量的随机梯度下降法,不断优化球面上点的分布,直到结果稳定.为了测试算法,模拟产生了两类球面分布数据:分别是球面均匀分布和球面正态分布的数据.实验结果表明,对于球面均匀分布的数据,即使在近邻点个数很少的情况下,仍然能够将数据准确地嵌入球面空间,嵌入后的数据分布与原始数据分布的均方根误差(RMSE)低于0.00001,且球面半径的估算误差低于0.000001;而对于球面正态分布的数据,在近邻点个数较多的情况下,该算法也可以将数据较准确地嵌入球面空间.因此,在非近邻点间距离缺失的情况下,所提方法仍然可以较准确地对数据进行低维嵌入,这非常有利于数据的可视化研究.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号