首页> 外文期刊>Journal of VLSI signal processing systems for signal, image, and video technology >An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment
【24h】

An Integrated Data Preprocessing Framework Based on Apache Spark for Fault Diagnosis of Power Grid Equipment

机译:基于Apache Spark的集成数据预处理框架用于电网设备故障诊断

获取原文
获取原文并翻译 | 示例

摘要

Big data techniques have been applied to power grid for the prediction and evaluation of grid conditions. However, the raw data quality can rarely meet the requirement of precise data analytics since raw data set usually contains samples with missing data to which the common data mining models are sensitive. Besides, the raw training data from a single monitoring system, e.g. dissolved gas analysis (DGA), are rarely sufficient for training in the form of valid instances since raw data set usually contains samples with noisy data. Though classic methods like neural network can be used to fill the gaps of missing data and classify the fault type, their models often fail to fit the rules of power grid conditions. This paper presents an integrated data preprocessing framework (DPF) based on Apache Spark to improve the prediction accuracy for data sets with missing data points and classification accuracy with noise data as well as to meet the big data requirement, which mainly combines missing data prediction, data fusion, data cleansing and fault type classification. First, the prediction model is trained based on the linear regression (LinR). Afterwards, we propose an optimized linear method (OLR) to improve the prediction accuracy. Then, to better utilize the strong correlation among different data sources, new data features extracted by persons correlation coefficient (PCC) are fused into a training data set. Next, principal component analysis (PCA) is taken to reduce the side effect brought by the new feature as well as retaining significant information for classification. Finally, the classification model based on logistic regression (LogR) and support vector machine (SVM) is trained to classify the fault type of electric equipment. We test the DPF framework on missing data prediction and fault type classification of power transformers in power grid system. The experimental results show that the predictors based on the proposed framework achieve lower mean square error and the classifiers obtain higher accuracy than traditional ones. Besides, the training time required for training large-scale data shows a decreasing trend. Therefore, the data preprocessing framework DPF would be a good candidate to predict the missing data and classify the fault type in power grid system.
机译:大数据技术已应用于电网,用于电网状况的预测和评估。但是,原始数据质量很少能满足精确数据分析的要求,因为原始数据集通常包含具有缺失数据的样本,而这些数据对于普通数据挖掘模型是敏感的。此外,来自单个监控系统的原始训练数据,例如溶解气体分析(DGA)很少以有效实例的形式进行训练,因为原始数据集通常包含带有噪声数据的样本。尽管可以使用诸如神经网络之类的经典方法来填补缺失数据的空白并对故障类型进行分类,但是它们的模型通常无法满足电网条件的规则。本文提出了一种基于Apache Spark的集成数据预处理框架(DPF),以提高对缺少数据点的数据集的预测准确性和对噪声数据进行分类的准确性,并满足大数据需求,它主要结合了缺失数据预测,数据融合,数据清理和故障类型分类。首先,基于线性回归(LinR)训练预测模型。然后,我们提出了一种优化的线性方法(OLR)以提高预测精度。然后,为了更好地利用不同数据源之间的强相关性,将人员相关系数(PCC)提取的新数据特征融合到训练数据集中。接下来,进行主成分分析(PCA),以减少新功能带来的副作用,并保留重要的分类信息。最后,训练了基于逻辑回归(LogR)和支持向量机(SVM)的分类模型,对电气设备的故障类型进行分类。我们对电网系统中电力变压器的数据丢失预测和故障类型分类进行了DPF框架测试。实验结果表明,与传统的预测器相比,基于该框架的预测器均方差更低,分类器的准确性更高。此外,训练大规模数据所需的训练时间呈下降趋势。因此,数据预处理框架DPF将是预测电网系统中缺失数据和分类故障类型的理想选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号