首页> 外文会议>European conference on principles of data mining and knowledge discovery >Rough Dependencies as a Particular Case of Correlation: Application to the Calculation of Approximative Reducts
【24h】

Rough Dependencies as a Particular Case of Correlation: Application to the Calculation of Approximative Reducts

机译:作为特定相关情况的粗略依赖性:应用于近似减量的计算

获取原文

摘要

Rough Sets Theory provides a sound basis for the extraction of qualitative knowledge (dependencies) from very large relational databases. Dependencies may be expressed by means of formulas (implications) in the following way: (x_1, …, x_n) =>_rho (y) where (x_1, …, x_n) are attributes that induce partitions into equivalence classes on the underlying population. Coefficient rho is the dependency degree, it establishes the percentage of objects that can be correctly assigned to classes of y, taking into account the classification induced by (x_1, …, x_n). Dealing with decision tables, it is important to determine rho and to eliminate from (x_1, …, x_n) redundant attributes, to obtain minimal reducts having the same classification power as the original set. The problem of reduct extraction is NP-hard. Thus, approximative reducts are often determined. Reducts have the same classification power of the original set of attributes but quite often contain redundant attributes. The main idea developed in this paper is that attributes considered as random variables related by means of a dependency, are also correlated (the opposite, in general, is not true). From this fact we try to find, making use of well stated and widely used statistical methods, only the most significant variables, that is to say, the variables that contribute the most (in a quantitative sense) to determine y. The set of attributes (in general a subset of (x_1, …, x_n)) obtained by means of well-founded sound statistical methods could be considered as a good approximation of a reduct.
机译:粗糙集理论为从非常大的关系数据库提取定性知识(依赖性)提供了一种声音依据。依赖关系可以通过以下方式(含义)以下列方式表示:(X_1,...,X_N)=> _ rho(y)其中(x_1,...,x_n)是诱导分区到基础群体上的等同类的属性。系数rho是依赖程度,它建立了可以正确分配给y类的对象的百分比,考虑到由(x_1,...,x_n)引起的分类。处理决策表,重要的是要确定RHO并消除(X_1,...,X_N)冗余属性,以获得具有与原始集合相同的分类功率的最小减阻。减少提取问题是NP-HARD。因此,通常确定近似化还原。减减具有原始属性集的相同分类功率,但通常包含冗余属性。本文开发的主要思想是认为作为依赖关系相关的随机变量的属性也相关(相反,通常,不是真的)。从这个事实中,我们尝试找到,利用良好的陈述和广泛使用的统计方法,只有最重要的变量,也就是说,贡献最多(在定量意义上)来确定y的变量。该组属性(通常是通过良好的声音统计方法获得的(X_1,...,X_N)的子集)可以被认为是减少的良好近似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号