首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction
【24h】

A Distance Measure Approach to Exploring the Rough Set Boundary Region for Attribute Reduction

机译:一种用于度量约简的粗糙集边界区域的测距方法

获取原文
获取原文并翻译 | 示例

摘要

Feature Selection (FS) or Attribute Reduction techniques are employed for dimensionality reduction and aim to select a subset of the original features of a data set which are rich in the most useful information. The benefits of employing FS techniques include improved data visualization and transparency, a reduction in training and utilization times and potentially, improved prediction performance. Many approaches based on rough set theory up to now, have employed the dependency function, which is based on lower approximations as an evaluation step in the FS process. However, by examining only that information which is considered to be certain and ignoring the boundary region, or region of uncertainty, much useful information is lost. This paper examines a rough set FS technique which uses the information gathered from both the lower approximation dependency value and a distance metric which considers the number of objects in the boundary region and the distance of those objects from the lower approximation. The use of this measure in rough set feature selection can result in smaller subset sizes than those obtained using the dependency function alone. This demonstrates that there is much valuable information to be extracted from the boundary region. Experimental results are presented for both crisp and real-valued data and compared with two other FS techniques in terms of subset size, runtimes, and classification accuracy.
机译:特征选择(FS)或属性约简技术用于降维,目的是选择数据集中原始特征的子集,该子集包含最有用的信息。使用FS技术的好处包括改进的数据可视化和透明性,减少的培训和使用时间以及潜在地改善的预测性能。迄今为止,许多基于粗糙集理论的方法都采用了依赖函数,该函数基于较低的近似值作为FS过程中的评估步骤。但是,仅通过检查被认为是确定的信息并忽略边界区域或不确定区域,就会丢失许多有用的信息。本文研究了一种粗糙集FS技术,该技术使用从较低近似依赖值和距离度量中收集的信息,该距离度量考虑了边界区域中对象的数量以及这些对象与较低近似值的距离。与仅使用依赖函数获得的子集相比,在粗糙集特征选择中使用此度量可导致更小的子集大小。这表明有很多有价值的信息要从边界区域提取。给出了针对清晰数据和实值数据的实验结果,并在子集大小,运行时间和分类准确性方面与其他两种FS技术进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号