针对大多数子空间聚类方法处理非线性数据时聚类效果不理想、不同子空间数据相似性较高及聚类发生错误时无法及时校验的问题,提出局部加权最小二乘回归的重叠子空间聚类算法.利用K近邻思想突出数据的局部信息,取代非线性数据结构,通过高斯加权的方法选择最相似的近邻数据点,得到最优表示系数.然后使用重叠概率模型判断子空间内数据的重叠部分,再次校验聚类结果,提高聚类准确率.在人造数据集和真实数据集上分别进行测试,实验表明,文中算法能够取得较理想的聚类结果.%Most subspace clustering methods can not deal with nonlinear data satisfactorily,and the data in different subspaces possess higher similarity and clustering error can not be verified in time. Aiming at these problems, an overlapping subspace clustering algorithm based on local weighted least squares regression(LWLSR) is proposed. The k-nearest neighbor(KNN) is introduced to highlight the local information of data and replace the nonlinear data structure. The nearest neighbor data points are selected by the Gaussian weighting method to obtain the optimal representation coefficients. Then, an overlapping probability model is employed to determine the overlap of the data in the subspace, and the clustering results are rechecked to improve the clustering accuracy. The experimental results on both artificial datasets and real-world datasets show that the proposed algorithm achieves better clustering results.
展开▼