首页> 外文学位 >Dynamic data mining on multi-dimensional data.
【24h】

Dynamic data mining on multi-dimensional data.

机译:在多维数据上的动态数据挖掘。

获取原文
获取原文并翻译 | 示例

摘要

The generation of multi-dimensional data has proceeded at an explosive rate in many disciplines with the advance of modern technology, which greatly increases the challenges of comprehending and interpreting the resulting mass of data. Existing data analysis techniques have difficulty in handling multi-dimensional data. Multi-dimensional data has been a challenge for data analysis because of the inherent sparsity of the points.; A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis is used to identify homogeneous and well-separated groups of objects in databases. The need to cluster large quantities of multi-dimensional data is widely recognized. It is a classical problem in the database, artificial intelligence, and theoretical literature, and plays an important role in many fields of business and science.; There are also a lot of approaches designed for outlier detection. In many situations, clusters and outliers are concepts whose meanings are inseparable to each other, especially for those data sets with noise. Thus, it is necessary to treat clusters and outliers as concepts of the same importance in data analysis.; It is well acknowledged that in the real world a large proportion of data has irrelevant features which may cause a reduction in the accuracy of some algorithms. High dimensional data sets continue to pose a challenge to clustering algorithms at a very fundamental level. One of the well known techniques for improving the data analysis performance is the method of dimension reduction which is often used in clustering, classification, and many other machine learning and data mining applications.; Many approaches have been proposed to index high-dimensional data sets for efficient querying. Although most of them can efficiently support nearest neighbor search for low dimensional data sets, they degrade rapidly when dimensionality goes higher. Also the dynamic insertion of new data can cause original structures no longer handle the data sets efficiently since it may greatly increase the amount of data accessed for a query.; In this dissertation, we study the problems mentioned above. We proposed a novel data pre-processing technique called shrinking which optimizes the inner structure of data inspired by Newton's Universal Law of Gravitation in the real world. We then proposed a shrinking-based clustering algorithm for multi-dimensional data and extended the algorithm to the dimension reduction field, resulting in a shrinking-based dimension reduction algorithm. (Abstract shortened by UMI.)
机译:随着现代技术的发展,多维数据的生成在许多学科中都以爆炸性的速度进行,这极大地增加了理解和解释最终数据量的挑战。现有的数据分析技术难以处理多维数据。由于点固有的稀疏性,多维数据一直是数据分析的挑战。解决这一挑战的第一步是使用聚类技术,这在数据挖掘过程中必不可少,以揭示自然结构并识别基础数据中有趣的模式。聚类分析用于识别数据库中对象的均质和分隔良好的组。人们普遍认识到需要对大量多维数据进行聚类。这是数据库,人工智能和理论文献中的经典问题,并且在商业和科学的许多领域中发挥着重要作用。还有许多用于离群值检测的方法。在许多情况下,聚类和离群值是彼此含义不可分离的概念,尤其是对于那些带有噪声的数据集。因此,有必要将聚类和离群值视为在数据分析中同样重要的概念。众所周知,在现实世界中,很大一部分数据具有不相关的特征,这可能会导致某些算法的准确性降低。高维数据集在非常基础的水平上继续对聚类算法构成挑战。改进数据分析性能的一种众所周知的技术是降维方法,该方法通常用于聚类,分类以及许多其他机器学习和数据挖掘应用程序中。已经提出了许多方法来索引高维数据集以进行有效的查询。尽管它们中的大多数可以有效地支持对低维数据集的最近邻居搜索,但是当维数变高时,它们会迅速退化。动态插入新数据也可能导致原始结构不再有效地处理数据集,因为这可能会大大增加查询访问的数据量。本文研究了上述问题。我们提出了一种称为收缩的新颖数据预处理技术,该技术可以优化牛顿在现实世界中的万有引力定律启发下的数据内部结构。然后,我们针对多维数据提出了一种基于收缩的聚类算法,并将其扩展到降维领域,从而得到了基于收缩的降维算法。 (摘要由UMI缩短。)

著录项

  • 作者

    Shi, Yong.;

  • 作者单位

    State University of New York at Buffalo.;

  • 授予单位 State University of New York at Buffalo.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 229 p.
  • 总页数 229
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号