Distributed hierarchical co-clustering and collaborative filtering algorithm

机译：分布式分层共聚协同过滤算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Petascale Analytics is a hot research area both in academia and industry. It envisages processing massive amounts of data at extremely high rates to generate new scientific insights along with positive impact (for both users and providers) of industries such as E-commerce, Telecom, Finance, Life Sciences and so forth. We consider collaborative filtering (CF) and Clustering algorithms that are key fundamental analytics kernels that help in achieving these aims. Real-time CF and co-clustering on highly sparse massive datasets, while achieving a high prediction accuracy, is a computationally challenging problem. In this paper, we present a novel hierarchical design for soft real-time (less than 1 minute.) distributed co-clustering based collaborative filtering algorithm. Our distributed algorithm has been optimized for multi-core cluster architectures. Theoretical analysis of the time complexity of our algorithm proves the efficacy of our approach. Using the Netflix dataset (900M training ratings with replication) as well as the Yahoo KDD Cup 1 (4.6B training ratings with replication) datasets, we demonstrate the performance and scalability of our algorithm on a 4096-node multi-core cluster architecture. Our distributed algorithm (implemented using OpenMP with MPI) demonstrates around 4x better performance (on Blue Gene/P) as compared to the best prior work, along with high accuracy (26 ± 4 RMSE for Yahoo KDD Cup data and 0.87 ± 0.02 for Netflix data). To the best of our knowledge, these are the best known performance results for collaborative filtering, at high prediction accuracy, for multi-core cluster architectures.

机译：Petascale Analytics是学术界和工业界的热门研究领域。它设想以极高的速率处理大量数据，以产生新的科学见解以及对电子商务，电信，金融，生命科学等行业的积极影响（对用户和提供者而言）。我们认为协作过滤（CF）和集群算法是帮助实现这些目标的关键基础分析内核。在高度稀疏的海量数据集上进行实时CF和共聚，同时实现较高的预测精度，是一个计算难题。在本文中，我们提出了一种新颖的基于软实时（少于1分钟）的分布式共聚协作过滤算法的分层设计。我们的分布式算法已针对多核集群体系结构进行了优化。对算法时间复杂度的理论分析证明了该方法的有效性。使用Netflix数据集（具有复制的900M训练等级）和Yahoo KDD Cup 1（具有复制的4.6B训练等级）数据集，我们展示了我们的算法在4096节点多核群集体系结构上的性能和可伸缩性。我们的分布式算法（使用带有MPI的OpenMP实施）与以前的最佳工作相比，展示了大约4倍的性能（在Blue Gene / P上），并具有很高的精度（Yahoo KDD Cup数据为26±4 RMSE，Netflix为0.87±0.02）数据）。据我们所知，这些是多核集群体系结构中以高预测精度进行协作过滤的最著名的性能结果。

著录项

来源
《2012 19th International Conference on High Performance Computing》|2012年|p.1-10|共10页
会议地点 Pune(IN)
作者
Narang Ankur; Srivastava Abhinav; Kumar Katta Naga Praveen;
展开▼
作者单位

IBM India Research Laboratory New Delhi, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. A Greedy Algorithm for k-Member Co-clustering and its Applicability to Collaborative Filtering [J] . Arina Kawano, Katsuhiro Honda, Hirohide Kasugai, Procedia Computer Science . 2013,第1期

机译： k -成员共聚的贪婪算法及其在协同过滤中的适用性
2. Integrating content-based filtering with collaborative filtering using co-clustering with augmented matrices [J] . Meng-Lun Wu, Chia-Hui Chang, Rui-Zhe Liu Expert Systems with Application . 2014,第6期

机译：将基于内容的过滤与协作过滤相集成，并使用增强矩阵的共聚
3. Partially Exclusive Item Partition in MMMs-Induced Fuzzy Co-Clustering and its Effects in Collaborative Filtering [J] . Katsuhiro Honda, Takaya Nakano, Chi-Hyon Oh, Journal of Advanced Computatioanl Intelligence and Intelligent Informatics . 2015,第6a113期

机译：MMM引起的模糊共聚中的部分排斥项目划分及其在协同过滤中的作用
4. Distributed hierarchical co-clustering and collaborative filtering algorithm [C] . Narang Ankur, Srivastava Abhinav, Kumar Katta Naga Praveen International Conference on High Performance Computing;Workshop on Performance Engineering and Applications;Workshop on Parallel Algorithms and Software for Analysis of Massive Graphs;International Workshop on Cloud Computing Applications;Workshop on Massive Data Analytics on Scalable Systems . 2012

机译：分布式分层共聚类和协同过滤算法
5. A Comparative Study of Collaborative Filtering Recommendation Systems Using Algorithms to Impute Large Sparse Matrices. [D] . Lindo, Steven Christopher. 2016

机译：使用算法插补大稀疏矩阵的协同过滤推荐系统的比较研究。
6. A Two-Phase Distributed Filtering Algorithm for Networked Uncertain Systems with Fading Measurements under Deception Attacks [O] . Raquel Caballero-Águila, Aurora Hermoso-Carazo, Josefa Linares-Pérez 2020

机译：一种两相分布式过滤算法包括欺骗攻击下衰落测量的网络不确定系统
7. Distributed Hierarchical Co-clustering and Collaborative Filtering Algorithm [O] . Ankur Narang, Abhinav Srivastava, Naga Praveen, 2014

机译：分布式分层协同聚类与协同过滤算法

Distributed hierarchical co-clustering and collaborative filtering algorithm

摘要

著录项

相似文献

相关主题

期刊订阅