HDSVM: A High Efficiency Distributed SVM Framework over Data Stream

机译：HDSVM：数据流上的高效分布式SVM框架

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The application of Support Vector Machine (SVM) over data stream is growing with the increasing real-time processing requirements in classification field, like anomaly detection and real-time image processing. However, the dynamic live data with high volume and fast arrival rate in data streams make it challenging to apply SVM in data stream processing. Existing SVM implementations are mostly designed for batch processing and hardly satisfy the efficiency requirement of stream processing for its inherent complexity. To address the challenges, we propose a high efficiency distributed SVM framework over data stream (HDSVM), which consists of two main algorithms, incremental learning algorithm and distributed algorithm. Firstly, we propose a partial support vectors reserving incremental learning algorithm (PSVIL). By selecting a subset of support vectors based on their distances to classification hyperplane instead of the universal set to update SVM, the algorithm achieves lower time overhead while ensuring accuracy. Secondly, we propose a distribution remaining partition and fast aggregation distributed algorithm (DRPFA) for SVM. The real-time data is partitioned based on the original distribution with clustering instead of random partition, and historical support vectors are partitioned based on their distances to the classification hyperplane. The global hyperplane can be obtained by averaging the parameters of local hyperplanes due to the above partition strategy. Extensive experiments on Apache Storm show that the proposed HDSVM achieve lower time overhead and similar accuracy compared with the state-of-art. Speed-up ratio is increased by 2-8 times within 1% accuracy deviation.

机译：支持向量机（SVM）在数据流上的应用随着分类领域对实时处理要求的提高而不断增长，例如异常检测和实时图像处理。但是，动态实时数据在数据流中具有高容量和快速到达率，这使得在数据流处理中应用SVM具有挑战性。现有的SVM实现主要是为批处理而设计的，由于其固有的复杂性，几乎不能满足流处理的效率要求。为了解决这些挑战，我们提出了一种高效的数据流分布式SVM框架（HDSVM），它由两个主要算法组成：增量学习算法和分布式算法。首先，我们提出了一种部分支持向量保留增量学习算法（PSVIL）。通过基于支持向量到分类超平面的距离而不是通用集来选择支持向量的子集来更新SVM，该算法可在确保准确性的同时实现较低的时间开销。其次，提出了支持向量机的剩余分布分区和快速聚合分布式算法（DRPFA）。实时数据基于原始分布进行聚类而不是随机分区，并且历史支持向量根据它们与分类超平面的距离进行分区。由于上述划分策略，可以通过平均局部超平面的参数来获得全局超平面。在Apache Storm上进行的大量实验表明，与最新技术相比，提出的HDSVM可以实现更低的时间开销和相似的准确性。在1％的精度偏差内，加速比提高了2-8倍。

著录项

来源
《15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications》|2017年|352-359|共8页
会议地点 Guangzhou(CN)
作者
Yan Hou; Yijie Wang; Xingkong Ma; Li Cheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Support vector machines; Training; Distributed databases; Distributed algorithms; Real-time systems; Partitioning algorithms; Streaming media;

机译：支持向量机;培训;分布式数据库;分布式算法;实时系统;分区算法;流媒体;;

相似文献

外文文献
中文文献
专利

1. A General Communication Cost Optimization Framework for Big Data Stream Processing in Geo-Distributed Data Centers [J] . Gu Lin, Zeng Deze, Guo Song, Computers, IEEE Transactions on . 2016,第1期

机译：地理分布数据中心中用于大数据流处理的通用通信成本优化框架
2. An Analysis on Task Migration Strategy of Big Data Streaming Storm Computing Framework for Distributed Processing [J] . Hu Xiling International journal of information system modeling and design . 2020,第4期

机译：分布式处理大数据流风暴计算框架任务迁移策略分析
3. An adaptive ensemble classification framework for real-time data streams by distributed control systems [J] . Neural computing & applications . 2020,第9期

机译：分布式控制系统的实时数据流的自适应合奏分类框架
4. HDSVM: A High Efficiency Distributed SVM Framework over Data Stream [C] . Yan Hou, Yijie Wang, Xingkong Ma, IEEE International Symposium on Parallel and Distributed Processing with Applications . 2017

机译：HDSVM：通过数据流的高效分布式SVM框架
5. Stream-Dashboard: A big data stream clustering framework with applications to social media streams. [D] . Hawwash, Basheer. 2013

机译：Stream-Dashboard：一个大数据流集群框架，其应用程序适用于社交媒体流。
6. A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring [O] . Adeyinka Akanbi, Muthoni Masinde 2020

机译：大数据平台上异构数据实时分析的分布式流处理中间件框架：环境监测案例
7. A Framework for Distributed Cleaning of Data Streams [O] . Gill Saul, Lee Brian 2015

机译：数据流的分布式清理框架
8. Integrated Framework to Access and Mine Distributed Heterogeneous Data Streams with Uncertainty. [R] . Zhang, K. 2015

机译：不确定性访问和挖掘分布式异构数据流的集成框架。

HDSVM: A High Efficiency Distributed SVM Framework over Data Stream

摘要

著录项

相似文献

相关主题

期刊订阅