An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

Shi Weiwei; Zhu Yongxin; Huang Tian; Sheng Gehao; Lian Yong; Wang Guoxing; Chen Yufeng

首页> 外文期刊>Journal of VLSI signal processing systems for signal, image, and video technology >An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

【24h】

An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

机译：基于Apache Spark的集成数据预处理框架用于电网设备故障诊断

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Big data techniques have been applied to power grid for the prediction and evaluation of grid conditions. However, the raw data quality can rarely meet the requirement of precise data analytics since raw data set usually contains samples with missing data to which the common data mining models are sensitive. Besides, the raw training data from a single monitoring system, e.g. dissolved gas analysis (DGA), are rarely sufficient for training in the form of valid instances since raw data set usually contains samples with noisy data. Though classic methods like neural network can be used to fill the gaps of missing data and classify the fault type, their models often fail to fit the rules of power grid conditions. This paper presents an integrated data preprocessing framework (DPF) based on Apache Spark to improve the prediction accuracy for data sets with missing data points and classification accuracy with noise data as well as to meet the big data requirement, which mainly combines missing data prediction, data fusion, data cleansing and fault type classification. First, the prediction model is trained based on the linear regression (LinR). Afterwards, we propose an optimized linear method (OLR) to improve the prediction accuracy. Then, to better utilize the strong correlation among different data sources, new data features extracted by persons correlation coefficient (PCC) are fused into a training data set. Next, principal component analysis (PCA) is taken to reduce the side effect brought by the new feature as well as retaining significant information for classification. Finally, the classification model based on logistic regression (LogR) and support vector machine (SVM) is trained to classify the fault type of electric equipment. We test the DPF framework on missing data prediction and fault type classification of power transformers in power grid system. The experimental results show that the predictors based on the proposed framework achieve lower mean square error and the classifiers obtain higher accuracy than traditional ones. Besides, the training time required for training large-scale data shows a decreasing trend. Therefore, the data preprocessing framework DPF would be a good candidate to predict the missing data and classify the fault type in power grid system.

机译：大数据技术已应用于电网，用于电网状况的预测和评估。但是，原始数据质量很少能满足精确数据分析的要求，因为原始数据集通常包含具有缺失数据的样本，而这些数据对于普通数据挖掘模型是敏感的。此外，来自单个监控系统的原始训练数据，例如溶解气体分析（DGA）很少以有效实例的形式进行训练，因为原始数据集通常包含带有噪声数据的样本。尽管可以使用诸如神经网络之类的经典方法来填补缺失数据的空白并对故障类型进行分类，但是它们的模型通常无法满足电网条件的规则。本文提出了一种基于Apache Spark的集成数据预处理框架（DPF），以提高对缺少数据点的数据集的预测准确性和对噪声数据进行分类的准确性，并满足大数据需求，它主要结合了缺失数据预测，数据融合，数据清理和故障类型分类。首先，基于线性回归（LinR）训练预测模型。然后，我们提出了一种优化的线性方法（OLR）以提高预测精度。然后，为了更好地利用不同数据源之间的强相关性，将人员相关系数（PCC）提取的新数据特征融合到训练数据集中。接下来，进行主成分分析（PCA），以减少新功能带来的副作用，并保留重要的分类信息。最后，训练了基于逻辑回归（LogR）和支持向量机（SVM）的分类模型，对电气设备的故障类型进行分类。我们对电网系统中电力变压器的数据丢失预测和故障类型分类进行了DPF框架测试。实验结果表明，与传统的预测器相比，基于该框架的预测器均方差更低，分类器的准确性更高。此外，训练大规模数据所需的训练时间呈下降趋势。因此，数据预处理框架DPF将是预测电网系统中缺失数据和分类故障类型的理想选择。

著录项

来源
《Journal of VLSI signal processing systems for signal, image, and video technology》 |2017年第3期|221-236|共16页
作者
Shi Weiwei; Zhu Yongxin; Huang Tian; Sheng Gehao; Lian Yong; Wang Guoxing; Chen Yufeng;
展开▼
作者单位

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai, Peoples R China;

Elect Power Res Inst Shandong Power Supply Co Sta, Qingdao, Shandong, Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Big data; Apache spark; Framework; Missing data prediction; Fault diagnose;

机译：大数据;Apache Spark;框架;缺少数据预测;故障诊断;

相似文献

外文文献
中文文献
专利

1. Research on operation fault diagnosis algorithm of power grid equipment based on power big data [J] . Jianguo Qian, Bingquan Zhu, Ying Li, Archives of Electrical Engineering . 2020,第4期

机译：基于电力大数据的电网设备运行故障诊断算法研究
2. An Integrated Power Grid Equipment Operation and Maintenance Solution Based on Big Data, Cloud Computing, the Internet of Things and Mobile Internet [J] . Liang Dong, Su Yirong, Liu Zhiwei, Automation, Control and Intelligent Systems . 2017,第5期

机译：基于大数据，云计算，物联网和移动互联网的集成电网设备运维解决方案
3. A data reconciliation based framework for integrated sensor and equipment performance monitoring in power plants [J] . Xiaolong Jiang, Pei Liu, Zheng Li Applied Energy . 2014,第deca1期

机译：基于数据协调的框架，用于电厂的集成传感器和设备性能监控
4. Research on Fault Diagnosis Technology for Power Grid Equipment Based on Spark [C] . Peng Liu, Wenhuan Wang, Yuying Wang IEEE Annual Information Technology, Electronics and Mobile Communication Conference . 2018

机译：基于Spark的电网设备故障诊断技术研究。
5. Streamlining Big Data Processing Pipelines via Unix Memory Tools, Persistent Spark Datasets, and the Apache Ignite Inmemory File System [D] . Blair, Walter 2018

机译：通过Unix内存工具，持久性Spark数据集和Apache Ignite内存文件系统简化大数据处理管道
6. SparkGA2: Production-quality memory-efficient Apache Spark based genome analysis framework [O] . Hamid Mushtaq, Nauman Ahmed, Zaid Al-Ars 2019

机译：Sparkga2：生产 - 质量记忆高效Apache Spark基因组分析框架
7. Research on big data risk assessment of major transformer defects and faults fusing power grid, equipment and environment based on SVM [O] . Lijuan Guo, Haijun Yan, Wensheng Gao, 2018

机译：基于SVM的大型变压器缺陷和故障融合电网，设备和环境的大数据风险评估研究

An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅