首页> 外文期刊>Software >Mining Extremely Small Data Sets With Application To Software Reuse
【24h】

Mining Extremely Small Data Sets With Application To Software Reuse

机译:挖掘极小的数据集并应用于软件重用

获取原文
获取原文并翻译 | 示例
       

摘要

A serious problem encountered by machine learning and data mining techniques in software engineering is the lack of sufficient data. For example, there are only 24 examples in the current largest data set on software reuse. In this paper, a recently proposed machine learning algorithm is modified for mining extremely small data sets. This algorithm works in a twice-learning style. In detail, a random forest is trained from the original data set at first. Then, virtual examples are generated from the random forest and used to train a single decision tree. In contrast to the numerous discrepancies between the empirical data and expert opinions reported by previous research, our mining practice shows that the empirical data are actually consistent with expert opinions.
机译:在软件工程中,机器学习和数据挖掘技术遇到的一个严重问题是缺少足够的数据。例如,当前最大的软件重用数据集中只有24个示例。在本文中,对最近提出的机器学习算法进行了修改,以挖掘极小的数据集。该算法以两次学习的方式工作。详细地说,首先从原始数据集中训练一个随机森林。然后,从随机森林中生成虚拟示例,并将其用于训练单个决策树。与以往研究报告的经验数据和专家意见之间存在众多差异相反,我们的采矿实践表明,经验数据实际上与专家意见相符。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号