首页> 外文会议>International Joint Conference on Neural Networks >Distributed Neural Networks for Missing Big Data Imputation
【24h】

Distributed Neural Networks for Missing Big Data Imputation

机译:缺少大数据插补的分布式神经网络

获取原文

摘要

In this paper we investigate the use of Distributed Neural Networks for the imputation of missing values in Big Data context. The presented framework for data imputation is implemented in Spark, allowing easy imputation as an additional step to the data pre-processing pipeline. The Distributed Neural Networks model is using Mini-batch Stochastic Gradient Descent, scaling well with the cluster size and minimizing the communication among the workers. The model is tested on a real-world Recommender Systems dataset, where the missing data is generally a problem for new items, as the systems ranking is usually biased towards the popular items. The model is compared with univariate (Mean and Median Imputation) and multivariate (K-Nearest Neighbours and Linear Regression) imputation techniques, and its performance is validated using prediction accuracy and speed. Furthermore, we evaluate the speedup compared to the sequential implementation of Neural Networks with Stochastic Gradient Descent.
机译:在本文中,我们调查了分布式神经网络在大数据上下文中缺失值的归咎。呈现的数据归档框架以火花实现,允许轻松归档作为数据预处理管道的额外步骤。分布式神经网络模型正在使用迷你批量随机梯度下降,与群集大小缩放,并最大限度地减少工人之间的通信。该模型在现实世界推荐系统数据集上进行测试,其中缺失的数据通常是新项目的问题,因为系统排名通常偏向流行项目。该模型与单变量进行比较( 均值和中位数归责 )和多变量(K-CORMATE邻居和线性回归)拒绝技术,并使用预测精度和速度验证其性能。此外,与具有随机梯度下降的神经网络的连续实现相比,我们评估了加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号