...
首页> 外文期刊>International Journal of Engineering Trends and Technology >Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique
【24h】

Uncertain Data Classification Using Decision Tree Classification Tool With Probability Density Function Modeling Technique

机译:使用决策树分类工具和概率密度函数建模技术进行不确定的数据分类

获取原文
           

摘要

Classical decision tree classifiers are constructed using certain or point data only. But in many real life applications inherently data is always uncertain. Attribute or value uncertainty is inherently associated with data values during data collection process. Attributes in the training data sets are of two types ¨Cnumerical (continuous) and categorical (discrete) attributes. Data uncertainty exists in both numerical and categorical attributes.Datauncertainty in numerical attributes means range of values and data uncertainty in categorical attributes means set or collection of values. In this paper we propose a method for handling data uncertainty in numerical attributes. One of the simplest and easiest methods of handling data uncertainty in numerical attributes is finding the mean or average or representative value of the set of original values of each value of an attribute. With data uncertainty the value of an attribute is usually represented by a set of values. Decision tree classification accuracy is much improved when attribute values are represented by sets of values rather than one single representative value. Probability density function with equal probabilities is one effective data uncertainty modellingtechnique to represent each value of an attribute as a set of values. Here the main assumption is that actual values provided in the training data sets are averaged or representative values of originally collected values through data collection process. For each representative value of each numerical attribute in the training data set, approximated values corresponding to the originally collected values are generated by using probability density function with equal probabilities and these newly generated sets of values are used in constructing a new decision tree classifier.
机译:经典决策树分类器仅使用某些或点数据构造。但是在许多实际应用中,数据固有地始终是不确定的。在数据收集过程中,属性或值的不确定性固有地与数据值相关联。训练数据集中的属性有两种类型:数字(连续)和类别(离散)属性。数值和类别属性中都存在数据不确定性。数值属性中的数据不确定性意味着值的范围,而类别属性中的数据不确定性意味着值的集合或集合。在本文中,我们提出了一种处理数值属性中数据不确定性的方法。处理数值属性中的数据不确定性的最简单,最简单的方法之一是找到每个属性值的原始值集的平均值或平均值或代表值。在数据不确定的情况下,属性的值通常由一组值表示。当属性值由一组值而不是一个代表值表示时,决策树分类的准确性将大大提高。具有相等概率的概率密度函数是一种有效的数据不确定性建模技术,用于将属性的每个值表示为一组值。这里的主要假设是训练数据集中提供的实际值是通过数据收集过程求出的原始值的平均值或代表值。对于训练数据集中每个数字属性的每个代表值,使用具有相等概率的概率密度函数生成与原始收集值相对应的近似值,并将这些新生成的值集用于构建新的决策树分类器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号