首页> 外文会议>Information Reuse and Integration, 2007 IEEE International Conference on >An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data
【24h】

An Empirical Study of the Classification Performance of Learners on Imbalanced and Noisy Software Quality Data

机译:基于不平衡嘈杂软件质量数据的学习者分类绩效的实证研究

获取原文
获取原文并翻译 | 示例

摘要

In the domain of software quality classification, data mining techniques are used to construct models (learners) for identifying software modules that are most likely to be fault-prone. The performance of these models, however, can be negatively affected by class imbalance and noise. Data sampling techniques have been proposed to alleviate the problem of class imbalance, but the impact of data quality on these techniques has not been adequately addressed. We examine the combined effects of noise and imbalance on classification performance when seven commonly-used sampling techniques are applied to software quality measurement data. Our results show that some sampling techniques are more robust in the presence of noise than others. Further, sampling techniques are affected by noise differently given different levels of imbalance.
机译:在软件质量分类的领域中,数据挖掘技术用于构建模型(学习器),以识别最可能出现故障的软件模块。但是,这些模型的性能会受到类别不平衡和噪声的负面影响。已经提出了数据采样技术来减轻类不平衡的问题,但是尚未充分解决数据质量对这些技术的影响。当将七种常用采样技术应用于软件质量测量数据时,我们研究了噪声和不平衡对分类性能的综合影响。我们的结果表明,某些采样技术在存在噪声的情况下比其他采样技术更可靠。此外,在不平衡程度不同的情况下,采样技术受噪声的影响也不同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号