首页> 外文会议>Joint IFSA World Congress and NAFIPS Annual Meeting >Fuzzy granular principal curves algorithm for large data sets
【24h】

Fuzzy granular principal curves algorithm for large data sets

机译:大数据集的模糊粒状主曲线算法

获取原文

摘要

Principal curves, as a nonlinear generalization of principal components, are a common tool used in multivariate analysis for ends like dimensionality reduction and feature extraction. However, one of the difficulties that arise when utilizing this technique is that efficiency of existing principal curves algorithms is often low when dealing with large data set owing to high computational complexity. In the paper, a new method based on the idea of "information granulation and fuzzy sets" is proposed to improve efficiency and noise robustness. First, large amounts of numerical data are granulated into C interval (granular) data based on the fuzzy C-means cluster and two criteria of granulation, which significantly reduces the amount of data that is to be processed in the later step. Then granular principal curves are constructed according to the upper and the lower bounds of the interval data. Finally we introduce a quantitative index based on the parameter a to evaluate the fuzziness of granular principal curves output, where a is a positive parameter delivering some flexibility when optimizing the information granule. A series of numeric studies completed for synthetic data set provide a useful insight into the effectiveness of the proposed algorithm.
机译:作为主要组分的非线性概括的主要曲线是用于多变量分析的常用工具,其端部是二维性降低和特征提取。然而,在利用该技术时出现的困难之一是当由于高计算复杂度处理大数据集时,现有主曲线算法的效率通常很低。本文提出了一种基于“信息造粒和模糊集”思想的新方法,以提高效率和噪音鲁棒性。首先,基于模糊C-Means集群和两个肉芽标准将大量数值数据造成C间隔(粒度)数据,这显着降低了在后面的步骤中要处理的数据量。然后根据间隔数据的上限和下限构造粒状主曲线。最后,我们基于参数A引入定量指数,以评估粒状主曲线输出的模糊性,其中A是在优化信息颗粒时提供一些灵活性的正参数。为合成数据集完成的一系列数字研究提供了对所提出的算法的有效性的有用洞察力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号