首页> 外文会议>International Conference on Issues and Challenges in Intelligent Computing Techniques >Handling Structured Data Using Data Mining Clustering Techniques
【24h】

Handling Structured Data Using Data Mining Clustering Techniques

机译:使用数据挖掘聚类技术处理结构化数据

获取原文

摘要

In the new era, every organization has the capability to store the extremely large amount of data. The continuous rise in the capturing of data is turning it into a huge tomb of data. Such huge data is becoming difficult to get analysed. This constantly growing large data set is making the challenge to the researchers in discovering knowledge from it. Valuable information is buried under the huge collection of data which can be extracted by making the use of Data Mining technique, as it possess the ability to dig out the embedded precious information from the large datasets. Various application areas required this technique, thus, resulted into an evolution of many data mining methods. Though several data mining methods get evolved not all of them were capable to deal with high voluminous data. Numerous computation and data- intensive scientific data analyses are established to compete with the ongoing time. As today's data has got converted to Big data, it now require large-scale data mining analyses to fulfil its scalability and performance requirements. To serve such data, several efficient parallel and concurrent algorithms got applied. The parallel algorithms used different parallelization techniques to manage the huge voluminous data and brought them into real action. Formerly, these techniques were : threads, MPI etc. which produce different performance and usability characteristics. The MPI model was efficient in computing rigorous problems but difficult to bring them into the practical use. Over coming years, Data mining is continuously spreading its root in business and in learning organizations. The new integrated clustering algorithm called CURE became more vigorous to outliers and recognizes those clusters that were having irregular shapes and are of variant size. CURE is formed with the combined features of random sampling and partitioning which assured that the quality of output clusters produced by it is much improved with respect to those clusters that are resulted from the prior algorithms. This paper put focus on CURE clustering technique which found suitable for working with large databases.
机译:在新时代,每个组织都有能力存储极大的数据。捕获数据的持续上升是将其转化为巨大的数据墓。这种巨大的数据变得难以分析。这种不断增长的大型数据集是对研究人员来说,从中发现知识的挑战。可以通过利用数据挖掘技术来提取的大量数据集中埋入了有价值的信息,因为它具有从大型数据集中挖掘嵌入珍贵信息的能力。各种应用领域需要这种技术,从而导致许多数据挖掘方法的演变。虽然几种数据挖掘方法变化并非所有数据挖掘方法都能够处理高巨大的数据。建立了许多计算和数据密集型科学数据分析,以与持续的时间竞争。由于今天的数据已转换为大数据,它现在需要大规模的数据挖掘分析来满足其可扩展性和性能要求。为了满足此类数据,应用了几种有效的并行和并发算法。并行算法使用了不同的并行化技术来管理巨大的庞大数据并将它们带入真正的行动。以前,这些技术是:产生不同性能和可用性特性的线程,MPI等。 MPI模型在计算严格的问题中是有效的,但难以将它们带入实际使用中。在未来几年内,数据挖掘在业务和学习组织中持续传播其根源。新的集成聚类算法称为固化对异常值更加活跃,并识别具有不规则形状并且具有变体尺寸的那些簇。通过随机采样和分区的组合特征形成固化,并确保由其由现有算法产生的簇产生的输出簇的质量大大改进。本文专注于固化聚类技术,该技术适合于使用大型数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号