...
首页> 外文期刊>Indian Journal of Science and Technology >An Enhanced Low Frequency Discretizer (ELFD) in Data Cleansing Stage
【24h】

An Enhanced Low Frequency Discretizer (ELFD) in Data Cleansing Stage

机译:数据清理阶段的增强型低频离散器(ELFD)

获取原文
           

摘要

Objective: Organizations always use data to help their knowledge discovery by using data mining techniques nowadays. Discretization algorithms are the main techniques to discover knowledge in the data cleansing stage. This study is to develop an enhanced discretization algorithm to investigate the impact of data cleansing on knowledge discovery. Methodology: The ELFD algorithm is based on the Low Frequency Discretizer (LFD) which includes four phases: copying dataset, calculating correlation ratio, identifying cut points and discretizing datasets. Using a part of the categorical attributes is to increase the correlation ratio between a numerical attribute and each categorical attribute. We evaluate the new discretization algorithm by using health datasets compared with LFD. The classification accuracy of the discretized dataset is the major criteria for evaluating the ELFD. Finding: The classification accuracy of the ELFD is greater than the classification accuracy of the LFD. Accuracy is enhanced by approximately 9% with the use of the ELFD. Considering manual recording errors, the time processing of the ELFD is similar to the LFD algorithm. Conclusion: The ELFD adds an additional step by choosing the top 75% categorical attributes for which the correlation ratio values are largest and then calculates the correlation ratio between the numerical attribute and these categorical attributes. Using a part of the categorical attributes increases correlation ratio values so that the ELFD improves knowledge discovery from personal information contained in health records during the stage of data cleansing.
机译:目标:如今,组织始终使用数据来通过数据挖掘技术来帮助其知识发现。离散化算法是在数据清理阶段发现知识的主要技术。这项研究旨在开发一种增强的离散化算法,以研究数据清理对知识发现的影响。方法:ELFD算法基于低频离散器(LFD),它包括四个阶段:复制数据集,计算相关比,确定切点和离散化数据集。使用部分分类属性是为了增加数字属性和每个分类属性之间的相关比。我们通过使用健康数据集与LFD进行比较来评估新的离散化算法。离散化数据集的分类准确性是评估ELFD的主要标准。发现:ELFD的分类精度大于LFD的分类精度。使用ELFD可使精度提高约9%。考虑到手动记录错误,ELFD的时间处理类似于LFD算法。结论:ELFD通过选择相关系数值最大的前75%分类属性,再增加一个步骤,然后计算数值属性与这些分类属性之间的相关比率。使用部分分类属性可以提高相关比率值,以便ELFD可以在数据清理阶段从健康记录中包含的个人信息中改善知识发现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号