首页> 外文会议>The 9th International Conference on Grid and Cooperative Computing >Data Mining of Mass Storage Based on Cloud Computing
【24h】

Data Mining of Mass Storage Based on Cloud Computing

机译:基于云计算的海量存储数据挖掘

获取原文

摘要

Cloud computing is an elastic computing model that the users can lease the resources from the rentable infrastructure. Cloud computing is gaining popularity due to its lower cost, high reliability and huge availability. To utilize the powerful and huge capability of cloud computing, this paper is to import it into data mining and machine learning field. As one of the most influential and open competition in machine learning area, Netflix Prize attached with mass storage had driven thousands of teams across the world to attack the problem, among which the final winner was BellKorȁ9;s Pragmatic Chaos team, who bested Netflixȁ9;s own algorithm for predicting ratings by 10%. Their solution is an ensemble of a large number of models, each of which specializes in addressing a different aspect of the data. Among such different models, k-nearest neighbors (KNN) and Restricted Boltzmann Machine (RBM) are reported to be two most important and successful models. As a result, we build two predictors based on such two model respectively with the order to testify their performance based on cloud computing platforms. The results show that KNN can achieve root mean square deviation (rmse) with 0:9468 after the Global Effect (GE) data preprocessing, which is better than the Cinematchȁ9;s performance with rmse being 0:951. The rmse for RBM algorithm is about 0:9670 on the raw dataset, which can be further improved by KNN model.
机译:云计算是一种弹性计算模型,用户可以从可租用的基础架构中租用资源。云计算由于其低成本,高可靠性和巨大的可用性而变得越来越流行。为了利用云计算的强大功能,本文将其引入数据挖掘和机器学习领域。作为机器学习领域最有影响力和最开放的竞赛之一,具有大容量存储功能的Netflix Prize推动了全球成千上万的团队来解决这个问题,最终的赢家是BellKorȁ9; Pragmatic Chaos团队击败了Netflixȁ9。自己的算法可将收视率预测提高10%。他们的解决方案是大量模型的集合,每个模型都专门研究数据的不同方面。在这样的不同模型中,据报道,k最近邻(KNN)和受限玻尔兹曼机(RBM)是两个最重要且最成功的模型。结果,我们分别基于这两个模型构建了两个预测器,以证明它们在云计算平台上的性能。结果表明,在全局效果(GE)数据预处理后,KNN可以在0:9468时达到均方根偏差(rmse),优于Cinematchȁ9; rmse为0:951时的性能。在原始数据集上,RBM算法的均方根值约为0:9670,可以通过KNN模型进行进一步改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号