首页> 外文会议>19th international world wide web conference 2010 >Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce
【24h】

Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce

机译:基于MapReduce的Web规模二元数据分析的分布式非负矩阵分解

获取原文

摘要

The Web abounds with dyadic data that keeps increasing by every single second. Previous work has repeatedly shown the usefulness of extracting the interaction structure inside dyadic data. A commonly used tool in extracting the underlying structure is the matrix factorization, whose fame was further boosted in the Netflix challenge. When we were trying to replicate the same success on real-world Web dyadic data, we were seriously challenged by the scalability of available tools. We therefore in this paper report our efforts on scaling up the nonnegative matrix factorization (NMF) technique. We show that by carefully partitioning the data and arranging the computations to maximize data locality and parallelism, factorizing a tens of millions by hundreds of millions matrix with billions of nonzero cells can be accomplished within tens of hours. This result effectively assures practitioners of the scalability of NMF on Web-scale dyadic data.
机译:Web上充斥着二进位数据,这些数据每秒钟不断增加。先前的工作反复证明了提取二进数据内部交互结构的有用性。提取底层结构的常用工具是矩阵分解,它的名气在Netflix挑战中得到了进一步提升。当我们试图在现实世界的Web二元数据上复制同样的成功时,可用工具的可伸缩性给我们带来了严峻的挑战。因此,我们在本文中报告了我们为扩大非负矩阵分解(NMF)技术所做的努力。我们表明,通过仔细地划分数据并安排计算以最大化数据局部性和并行性,可以在数十小时内完成将数十亿乘以数十亿个非零像元分解成数以千万计的矩阵。该结果有效地确保了从业人员在网络规模的二进位数据上NMF的可伸缩性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号