首页> 外文会议>International Conference on Information Reuse and Integration for Data Science >Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining
【24h】

Billion-Scale Matrix Compression and Multiplication with Implications in Data Mining

机译:十亿规模的矩阵压缩和数据挖掘的影响乘法

获取原文

摘要

Billion-scale Boolean matrices in the era of big data occupy storage that is measured in 100's of petabytes to zetabytes. The fundamental operation on these matrices for data mining involves multiplication which suffers a significant slow-down as the required data cannot fit in most main memories. In this paper, we propose new algorithms to perform Matrix-Vector and Matrix-Matrix operations directly on compressed Boolean matrices using innovative techniques extended from our previous work on compression. Our extension involves the development of a row-by-row differential compression technique which reduces the overall space requirement and the number of matrix operations. We have provided extensive empirical results on billion-scale Boolean matrices that are Boolean adjacency matrices of web graphs. Our work has significant implications on key problems such as page-ranking and itemset mining that use matrix multiplication.
机译:十亿规模的布尔矩阵在大数据的时代占据存储空间,以100岁的Petabytes达到Zetabytes。用于数据挖掘的这些矩阵的基本操作涉及乘法,因为所需的数据不能适合大多数主存储器,这促成了显着速度。在本文中,我们提出了新的算法,用于使用我们以前的压缩工作中的创新技术直接在压缩布尔矩阵上执行矩阵矢量和矩阵矩阵操作。我们的扩展涉及开发一排逐行差分压缩技术,从而减少了整体空间要求和矩阵操作的数量。我们为亿级布尔矩阵提供了广泛的经验结果,这些矩阵是Web图的布尔邻接矩阵。我们的工作对使用矩阵乘法的页面排名和项目集挖掘等关键问题具有重大影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号