首页> 外文会议>2015 IEEE International Congress on Big Data >A Parallel Distributed Weka Framework for Big Data Mining Using Spark
【24h】

A Parallel Distributed Weka Framework for Big Data Mining Using Spark

机译:使用Spark的并行分布式Weka大数据挖掘框架

获取原文
获取原文并翻译 | 示例

摘要

Effective Big Data Mining requires scalable and efficient solutions that are also accessible to users of all levels of expertise. Despite this, many current efforts to provide effective knowledge extraction via large-scale Big Data Mining tools focus more on performance than on use and tuning which are complex problems even for experts. Weka is a popular and comprehensive Data Mining workbench with a well-known and intuitive interface, nonetheless it supports only sequential single-node execution. Hence, the size of the datasets and processing tasks that Weka can handle within its existing environment is limited both by the amount of memory in a single node and by sequential execution. This work discusses DistributedWekaSpark, a distributed framework for Weka which maintains its existing user interface. The framework is implemented on top of Spark, a Hadoop-related distributed framework with fast in-memory processing capabilities and support for iterative computations. By combining Weka's usability and Spark's processing power, DistributedWekaSpark provides a usable prototype distributed Big Data Mining workbench that achieves near-linear scaling in executing various real-world scale workloads - 91.4% weak scaling efficiency on average and up to 4x faster on average than Hadoop.
机译:有效的大数据挖掘需要可扩展且高效的解决方案,所有专业知识水平的用户也可以使用它们。尽管如此,当前通过大规模大数据挖掘工具提供有效知识提取的许多努力都集中在性能上,而不是在使用和调优上,即使对于专家而言,这也是复杂的问题。 Weka是一个流行且全面的数据挖掘工作台,具有众所周知的直观界面,但它仅支持顺序单节点执行。因此,Weka在其现有环境中可以处理的数据集和处理任务的大小受单个节点中的内存量和顺序执行的限制。这项工作讨论了DistributedWekaSpark,这是一个用于Weka的分布式框架,该框架维护了其现有的用户界面。该框架是在Spark之上实现的,Spark是与Hadoop相关的分布式框架,具有快速的内存处理功能并支持迭代计算。通过结合Weka的可用性和Spark的处理能力,DistributedWekaSpark提供了一个可用的原型分布式大数据挖掘工作台,可以在执行各种实际规模的工作负载时实现近乎线性的扩展-平均弱于91.4%的扩展效率,平均比Hadoop快4倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号