首页> 外文会议>International Conference on Computer and Communication Engineering >Modifying Cleaning Method in Big Data Analytics Process using Random Forest Classifier
【24h】

Modifying Cleaning Method in Big Data Analytics Process using Random Forest Classifier

机译:使用随机森林分类器修改大数据分析过程中的清理方法

获取原文

摘要

Accurate data is a key success factor influencing the performance of data analytics results, especially for the detection and prediction purpose. Nowadays, Big Data analytics (BDA) is used to analyze the sheer volume of data available in an organization. These data quality must be maintained in order to obtain correct alert and valuable insights from the rapidly changing data of high volume, velocity, variety, veracity, and value. This paper aim is to modify existing framework of big data analytics by improving an important step in pre-processing (i.e. Data Cleaning). Initially, feature selection based on Random Forest is used to extract effective features. Then, two classifier algorithms (i.e. Random Forest classifier and Linear SVM classifier) are applied to train using the dataset to classify data quality and to develop an intelligent model. In evaluation, our experimental results show a consistent accuracy of Random Forest and Linear Regression around 90%. Using this approach, we expect to provide a set of cleaned data for further processing. Besides, analysts can benefit from this system in data analytical process in cleaning stage and conclude that the data is cleaned. Finally, a comparison is presented between available functions which are used to handle missing values with the developed system.
机译:准确的数据是影响数据分析结果性能的关键成功因素,尤其是对于检测和预测目的而言。如今,大数据分析(BDA)用于分析组织中可用的庞大数据量。必须保持这些数据质量,以便从数量,速度,种类,准确性和价值迅速变化的数据中获得正确的警报和有价值的见解。本文旨在通过改进预处理(即数据清理)中的重要步骤来修改现有的大数据分析框架。最初,基于随机森林的特征选择用于提取有效特征。然后,应用两种分类器算法(即随机森林分类器和线性SVM分类器)来训练使用数据集对数据质量进行分类并开发智能模型。在评估中,我们的实验结果表明随机森林和线性回归的准确性始终保持在90%左右。使用这种方法,我们希望提供一组清除的数据以进行进一步处理。此外,分析人员可以在清理阶段的数据分析过程中受益于该系统,并得出结论认为数据已清理。最后,对可用功能之间的比较进行了比较,这些功能用于处理已开发系统的缺失值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号