首页> 外文OA文献 >Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java
【2h】

Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java

机译:使用Apache Spark Mllib和Hadoop HDFS与Scala和Java的大数据分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Nowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job result among each phase must be stored before the following phase is started as well as to the replication delays. Apache Spark is another tool that developed and established to be the real model for analyzing big data with its innovative processing framework inside the memory and high-level programming libraries for machine learning, efficient data treating and etc. In this paper, some comparisons are presented about the time performance evaluation among Scala and Java in apache spark MLlib. Many tests have been done in supervised and unsupervised machine learning methods with utilizing big datasets. However, loading the datasets from Hadoop HDFS as well as to the local disk to identify the pros and cons of each manner and discovering perfect reading or loading dataset situation to reach best execution style. The results showed that the performance of Scala about 10% to 20% is better than Java depending on the algorithm type. The aim of the study is to analyze big data with more suitable programming languages and as consequences gaining better performance.
机译:如今,技术革命的大数据是十年的现象,它对我们的应用科学趋势产生了重大影响。探索大数据工具目前是必要的需求。 Hadoop是一个很好的大数据分析技术,但它很慢,因为必须在开始以下阶段之前存储每个阶段之间的工作结果以及复制延迟。 Apache Spark是另一个工具,该工具是通过内存和高级编程库中的创新处理框架分析大数据的真实模型,用于机器学习,高效数据处理等。在本文中,提出了一些比较关于Apache Spark Mllib中Scala和Java之间的时间绩效评估。利用大型数据集,在监督和无监督的机器学习方法中完成了许多测试。但是,从Hadoop HDFS以及本地磁盘加载数据集以确定每种方式的优缺点,并发现完美的阅读或加载数据集情况,以达到最佳执行风格。结果表明,根据算法类型,Scala的性能比Java更好。该研究的目的是分析具有更合适的编程语言的大数据,并随后获得更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号