首页> 外文会议>IEEE International Congress on Big Data >PaWI: Parallel Weighted Itemset Mining by Means of MapReduce
【24h】

PaWI: Parallel Weighted Itemset Mining by Means of MapReduce

机译:Pawi:通过MapReduce并行加权项集挖掘

获取原文

摘要

Frequent item set mining is an exploratory data mining technique that has fruitfully been exploited to extract recurrent co-occurrences between data items. Since in many application contexts items are enriched with weights denoting their relative importance in the analyzed data, pushing item weights into the item set mining process, i.e., Mining weighted item sets rather than traditional item sets, is an appealing research direction. Although many efficient in-memory weighted item set mining algorithms are available in literature, there is a lack of parallel and distributed solutions which are able to scale towards Big Weighted Data. This paper presents a scalable frequent weighted item set mining algorithm based on the MapReduce paradigm. To demonstrate its action ability and scalability, the proposed algorithm was tested on a real Big dataset collecting approximately 34 millions of reviews of Amazon items. Weights indicate the ratings given by users to the purchased items. The mined item sets represent combinations of items that were frequently bought together with an overall rating above average.
机译:频繁的项目集挖掘是一种探索性数据挖掘技术,效果果断地被剥削以在数据项之间提取复发性共同发生。由于在许多应用程序上下文中,物品被富裕,其重量表示它们在分析的数据中的相对重要性,将项目权重推动到项目集挖掘过程中,即,挖掘加权项目集而不是传统项目集,是一种吸引人的研究方向。虽然文献中有许多高效的内存加权项目集挖掘算法,但缺乏缺乏并行和分布式解决方案,能够朝大量数据扩展。本文介绍了一种基于MapReduce Paradigm的可伸缩频繁加权项集挖掘算法。为了展示其动作能力和可扩展性,所提出的算法在Real Big DataSet上进行了测试,收集约34千万亚马逊物品的评论。权重表明用户给购买物品给出的评级。 Mined Item Sets表示经常与高于平均水平一起购买的物品的组合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号