首页> 外文会议>Nirma University International Conference on Engineering >Web users browsing behavior prediction by implementing support vector machines in MapReduce using cloud based Hadoop
【24h】

Web users browsing behavior prediction by implementing support vector machines in MapReduce using cloud based Hadoop

机译:Web用户通过使用基于云的Hadoop在MapReduce中实现支持向量机来浏览行为预测

获取原文

摘要

The motivation behind the work is that the prediction of web user's browsing behavior while serving the Internet, reduces the user's browsing access time and avoids the visit of unnecessary pages to ease network traffic. This research work introduces parallel Support Vector Machines for web page prediction. The web contains an enormous amount of data and web data increases exponentially, but the training time for Support vector machine is very large. That is, SVM's suffer from a widely recognized scalability problems in both memory requirements and computation time when the input dataset is too large. To address this, we aimed at training the Support vector machine model in MapReduce programming model of Hadoop framework, since the MapReduce programming model has the ability to rapidly process a large amount of data in parallel. MapReduce works in tandem with Hadoop Distributed File System (HDFS). The so proposed approach will solve the scalability problem of present SVM algorithm. The performance of the proposed approach is evaluated in Amazon cloud EC2 using cloud-based Hadoop. Our experiments show the effectiveness in term of training time and also improve the preprocessing time. We find in our research study that a number of nodes increased the training time of proposed algorithm is decreased. We checked that parallelization of SMO has no more negative effect on the accuracy level, as compared to the standard approach.
机译:这项工作的动机在于,在服务于Internet的同时预测Web用户的浏览行为,可以减少用户的浏览访问时间,并避免访问不必要的页面以减轻网络流量。这项研究工作介绍了用于网页预测的并行支持向量机。 Web包含大量数据,并且Web数据呈指数增长,但是Support Vector Machine的训练时间非常长。也就是说,当输入数据集太大时,SVM在内存需求和计算时间上都受到广泛认可的可伸缩性问题。为了解决这个问题,我们旨在在Hadoop框架的MapReduce编程模型中训练支持向量机模型,因为MapReduce编程模型具有快速并行处理大量数据的能力。 MapReduce与Hadoop分布式文件系统(HDFS)协同工作。所提出的方法将解决当前SVM算法的可伸缩性问题。使用基于云的Hadoop在Amazon cloud EC2中评估了所建议方法的性能。我们的实验显示了在训练时间方面的有效性,并改善了预处理时间。在我们的研究中我们发现,增加节点数量可以减少所提出算法的训练时间。我们检查了与标准方法相比,SMO的并行化对精度水平没有更多负面影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号