首页> 外文会议>2015 IEEE International Congress on Big Data >PaWI: Parallel Weighted Itemset Mining by Means of MapReduce
【24h】

PaWI: Parallel Weighted Itemset Mining by Means of MapReduce

机译:PaWI:通过MapReduce并行加权项目集挖掘

获取原文
获取原文并翻译 | 示例

摘要

Frequent item set mining is an exploratory data mining technique that has fruitfully been exploited to extract recurrent co-occurrences between data items. Since in many application contexts items are enriched with weights denoting their relative importance in the analyzed data, pushing item weights into the item set mining process, i.e., Mining weighted item sets rather than traditional item sets, is an appealing research direction. Although many efficient in-memory weighted item set mining algorithms are available in literature, there is a lack of parallel and distributed solutions which are able to scale towards Big Weighted Data. This paper presents a scalable frequent weighted item set mining algorithm based on the MapReduce paradigm. To demonstrate its action ability and scalability, the proposed algorithm was tested on a real Big dataset collecting approximately 34 millions of reviews of Amazon items. Weights indicate the ratings given by users to the purchased items. The mined item sets represent combinations of items that were frequently bought together with an overall rating above average.
机译:频繁项集挖掘是一种探索性的数据挖掘技术,已被有效地用于提取数据项之间的重复出现。由于在许多应用上下文中,项目都富含表示其在分析数据中的相对重要性的权重,因此将项目权重推入项目集挖掘过程(即,挖掘加权项目集而不是传统项目集)是一个有吸引力的研究方向。尽管文献中提供了许多有效的内存中加权项目集挖掘算法,但仍缺乏能够扩展到大加权数据的并行和分布式解决方案。本文提出了一种基于MapReduce范式的可伸缩频繁加权项目集挖掘算法。为了展示其动作能力和可伸缩性,在真实的Big数据集上对提出的算法进行了测试,该数据集收集了大约3400万条亚马逊商品评论。权重表示用户对所购买物品的评级。开采的项目集表示经常购买的项目组合以及总体评级高于平均水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号