...
首页> 外文期刊>Software Quality Journal >Application of mutual information-based sequential feature selection to ISBSG mixed data
【24h】

Application of mutual information-based sequential feature selection to ISBSG mixed data

机译:基于互信息的顺序特征选择在ISBSG混合数据中的应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

There is still little research work focused on feature selection (FS) techniques including both categorical and continuous features in Software Development Effort Estimation (SDEE) literature. This paper addresses the problem of selecting the most relevant features from ISBSG (International Software Benchmarking Standards Group) dataset to be used in SDEE. The aim is to show the usefulness of splitting the ranked list of features provided by a mutual information-based sequential FS approach in two, regarding categorical and continuous features. These lists are later recombined according to the accuracy of a case-based reasoning model. Thus, four FS algorithms are compared using a complete dataset with 621 projects and 12 features from ISBSG. On the one hand, two algorithms just consider the relevance, while the remaining two follow the criterion of maximizing relevance and also minimizing redundancy between any independent feature and the already selected features. On the other hand, the algorithms that do not discriminate between continuous and categorical features consider just one list, whereas those that differentiate them use two lists that are later combined. As a result, the algorithms that use two lists present better performance than those algorithms that use one list. Thus, it is meaningful to consider two different lists of features so that the categorical features may be selected more frequently. We also suggest promoting the usage of Application Group, Project Elapsed Time, and First Data Base System features with preference over the more frequently used Development Type, Language Type, and Development Platform.
机译:很少有研究工作集中在功能选择(FS)技术上,包括软件开发工作量估算(SDEE)文献中的分类功能和连续功能。本文解决了从ISBSG(国际软件基准标准组)数据集中选择最相关的功能以用于SDEE的问题。目的是显示将基于互信息的顺序FS方法所提供的特征的排序列表在分类和连续特征方面的用途一分为二的用途。这些列表随后根据基于案例的推理模型的准确性重新组合。因此,使用具有621个项目和ISBSG的12个特征的完整数据集,对四种FS算法进行了比较。一方面,两种算法仅考虑相关性,而其余两种算法遵循最大化相关性以及最小化任何独立特征与已选择特征之间的冗余性的准则。另一方面,不区分连续特征和分类特征的算法仅考虑一个列表,而区分它们的算法则使用两个列表,这些列表随后进行组合。结果,使用两个列表的算法比使用一个列表的算法具有更好的性能。因此,有意义的是考虑两个不同的特征列表,以便可以更频繁地选择分类特征。我们还建议提高应用程序组,项目经过时间和第一个数据库系统功能的使用,而不是更常用的开发类型,语言类型和开发平台。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号