首页> 外文会议>2010 IEEE International Conference on Information Theory and Information Security >Distributed log information processing with Map-Reduce: A case study from raw data to final models
【24h】

Distributed log information processing with Map-Reduce: A case study from raw data to final models

机译:使用Map-Reduce进行分布式日志信息处理:从原始数据到最终模型的案例研究

获取原文

摘要

With the high development of Internet, e-commerce websites now routinely have to work with log datasets which are up to a few terabytes in size. How to remove messy data timely with low cost and find out useful information is a problem we have to face. The mining process involves several steps from pre-processing the raw data to establishing the final models. In this paper we describe our method to solve the problem with Map-Reduce. Hadoop[7] is a Map-Reduce implementation develops open-source software for reliable, scalable, distributed computing. Several applications which we have proposed: data extracting, sum operation, join operation and clustering algorithm are applied on hadoop. We can apply them on data pre-processing and detect users with the same interests. In particular, we focus on clustering algorithms. A clustering algorithms which integrate SOM(Self-Organized Map) and fuzzy[13] logic is combined with Map-Reduce and we call it MRSF here. With the help of hadoop cluster, large calculation of jobs with MRSF can be accommodated easily by just adding more nodes or computers to the cluster. From the experiment, we show that MRSF can scale well and efficiently process and analyze extremely large datasets.
机译:随着Internet的高度发展,电子商务网站现在通常必须使用日志数据集,该数据集最多可达几个TB的大小。如何及时删除杂乱数据,低成本,找出有用的信息是我们必须面对的问题。挖掘过程涉及从预处理原始数据以建立最终模型的几个步骤。在本文中,我们描述了解决地图减少问题的方法。 Hadoop [7]是一种地图 - 减少实现,开发开源软件,可用于可靠,可扩展,分布式计算。我们提出的几个应用程序:在Hadoop上应用了数据提取,和操作,加入操作和聚类算法。我们可以将它们应用于数据预处理并检测具有相同兴趣的用户。特别是,我们专注于聚类算法。将SOM(自组织地图)和模糊[13]逻辑集成的聚类算法与Map-Refey相结合,我们在此处调用MRSF。在Hadoop集群的帮助下,只需将更多节点或计算机添加到群集即可轻松地容纳MRSF的大量计算。从实验中,我们表明MRSF可以康复和有效地处理和分析极大的数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号