...
首页> 外文期刊>International Journal of Business Process Integration and Management >Comparing SQL and NoSQL approaches for clustering over big data
【24h】

Comparing SQL and NoSQL approaches for clustering over big data

机译:比较用于大数据集群的SQL和NoSQL方法

获取原文
获取原文并翻译 | 示例
           

摘要

Data mining is the process of discovering patterns in large datasets. With the exponential growth of available information, new machine learning, statistics and other analytics techniques have to be developed to solve the processing needs required to do such analysis fast enough to be used successfully. In this study, techniques like cluster analysis are used over generated data in order to do customer segmentation, and the system performance is evaluated by measuring the processing time. The data used in the current paper is generated using the Star Schema Benchmark (SSB). Our main goal is to find a scalable solution to run data mining over a decision support benchmark. Four different systems will be tested: single node MySQL, MySQL cluster, Apache Mahout and R. By running MySQL cluster and Mahout, each system distributed by four nodes, the paper compares the performance of k-means run in parallel. MySQL and R will allow for comparison of this kind of execution against methods running on a single machine, both on relational and non-relational systems.
机译:数据挖掘是在大型数据集中发现模式的过程。随着可用信息的指数增长,必须开发新的机器学习,统计数据和其他分析技术,以解决快速完成此类分析以成功使用所需的处理需求。在这项研究中,对生成的数据使用诸如聚类分析之类的技术以进行客户细分,并通过测量处理时间来评估系统性能。本文中使用的数据是使用星型模式基准(SSB)生成的。我们的主要目标是找到一个可扩展的解决方案,以在决策支持基准上运行数据挖掘。将测试四个不同的系统:单节点MySQL,MySQL集群,Apache Mahout和R。通过运行MySQL集群和Mahout,每个系统由四个节点分布,本文比较了并行运行的k均值的性能。 MySQL和R允许将这种执行与在关系和非关系系统上在一台计算机上运行的方法进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号