A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

Liu Zhiqiang; Shi Xuanhua; He Ligang; Yu Dongxiao; Jin Hai; Yu Chen; Dai Hulin; Feng Zezhao

首页> 外文期刊>Distributed and Parallel Databases >A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

【24h】

A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

机译：大规模时空数据挖掘参数级并行优化算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of spatio-temporal data mining is to discover previously unknown but useful patterns from the spatial and temporal data. However, explosive growth of the spatiotemporal data emphasizes the need for developing novel computationally efficient methods for large-scale data mining applications. Since lots of spatiotemporal data mining problems can be converted to an optimization problem, in this paper, we propose an efficient parameter-level parallel optimization algorithm for large-scale spatiotemporal data mining. In detail, most of previous optimization methods are based on gradient descent methods, which iteratively update the model and provide model-level convergence control for all parameters. Namely, they treat all parameters equally and keep updating all parameters until every parameter has converged. However, we find that during the iterative process, the convergence rates of model parameters are different from each other. This may cause redundant computation and reduce the performance. To solve this problem, we propose a parameter-level stochastic gradient descent (plpSGD), in which the convergence of each parameter is considered independently and only unconvergent parameters are updated in each iteration. Moreover, the updating of model parameters are parallelized in plpSGD to further improve the performance of SGD. We have conducted extensive experiments to evaluate the performance of plpSGD. The experimental results show that compared to previous SGD methods, plpSGD can significantly accelerate the convergence of SGD and achieve the excellent scalability with little sacrifice of the solution accuracy.

机译：时空数据挖掘的目标是从空间和时间数据中发现先前未知但有用的模式。然而，时空数据的爆炸性增长强调需要开发用于大规模数据挖掘应用的新型计算有效的方法。由于许多时空数据挖掘问题可以转换为优化问题，从本文提出了一种高效的参数级并行优化算法，用于大规模时空数据挖掘。详细地说，最先前的优化方法中的大多数基于梯度下降方法，它迭代地更新模型并为所有参数提供模型级收敛控制。即，它们同样对待所有参数并继续更新所有参数，直到每个参数都收敛。然而，我们发现在迭代过程中，模型参数的收敛速率彼此不同。这可能导致冗余计算并降低性能。为了解决这个问题，我们提出了一种参数级随机梯度下降（PLPSGD），其中每个参数的收敛被独立地考虑，并且在每次迭代中仅更新不包含的参数。此外，模型参数的更新在PLPSGD中并行化，以进一步提高SGD的性能。我们已经进行了广泛的实验来评估PLPSGD的表现。实验结果表明，与先前的SGD方法相比，PLPSGD可以显着加速SGD的收敛性，实现优异的溶液精度牺牲。

著录项

来源
《Distributed and Parallel Databases》 |2020年第3期|739-765|共27页
作者
Liu Zhiqiang; Shi Xuanhua; He Ligang; Yu Dongxiao; Jin Hai; Yu Chen; Dai Hulin; Feng Zezhao;
展开▼
作者单位

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Univ Warwick Dept Comp Sci Coventry W Midlands England;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Sch Comp Sci & Technol Wuhan 430074 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Spatio-temporal data mining; Stochastic gradient descent; Block; Convergent rate; Redundant update;

机译：时空数据挖掘;随机梯度下降;块;收敛速率;冗余更新;

相似文献

外文文献
中文文献
专利

1. An Optimized Distributed Association Rule Mining Algorithm in Parallel and Distributed Data Mining with XML Data for Improved Response Time [J] . Sujni Paul International Journal of Computer Science & Information Technology (IJCSIT) . 2010,第2期

机译：XML数据并行和分布式数据挖掘中的优化分布式关联规则挖掘算法，可提高响应时间
2. High Performance Computation of Big Data: Performance Optimization Approach towards a Parallel Frequent Item Set Mining Algorithm for Transaction Data based on Hadoop MapReduce Framework [J] . Guru Prasad M S, Nagesh H R, Swathi Prabhu International Journal of Intelligent Systems and Applications . 2017,第1期

机译：大数据的高性能计算：基于Hadoop MapReduce框架的事务数据并行频繁项集挖掘算法的性能优化方法
3. A formally based parallelization of data mining algorithms for multi-core systems [J] . Kholod Ivan, Shorov Andrey, Titkov Evgenii, Journal of supercomputing . 2019,第12期

机译：基于形式的多核系统数据挖掘算法的并行化
4. The use of cultural algorithms with evolutionary programming to control the data mining of large-scale spatio-temporal databases [C] . Reynolds, R., Al-Shehri, . 1997

机译：文化算法与进化规划的结合使用，以控制大规模时空数据库的数据挖掘
5. Efficient Algorithms for Mining Large Spatio-Temporal Data [D] . Chen, Feng. 2012

机译：用于采矿大型时空数据的高效算法
6. Forecasting and optimizing Agrobacterium-mediated genetic transformation via ensemble model- fruit fly optimization algorithm: A data mining approach using chrysanthemum databases [O] . Mohsen Hesami, Milad Alizadeh, Roohangiz Naderi, 2020

机译：通过集合模型飞行优化算法预测和优化农杆菌介导的遗传转化：使用菊花数据库的数据挖掘方法
7. AN OPTIMIZED DISTRIBUTED ASSOCIATION RULE MINING ALGORITHM IN PARALLEL AND DISTRIBUTED DATA MINING WITH XML DATA FOR IMPROVED RESPONSE TIME. [O] . 2011

机译：利用XmL数据优化分布式关联规则挖掘和分布式数据挖掘算法，提高响应时间。

A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

摘要

著录项

相似文献

相关主题

期刊订阅