首页> 外文会议>IEEE/ACM International Conference on Software Engineering >Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions
【24h】

Operation is the hardest teacher: estimating DNN accuracy looking for mispredictions

机译:操作是最难的老师:估计DNN精度寻找错误预测

获取原文

摘要

Deep Neural Networks (DNN) are typically tested for accuracy relying on a set of unlabelled real world data (operational dataset), from which a subset is selected, manually labelled and used as test suite. This subset is required to be small (due to manual labelling cost) yet to faithfully represent the operational context, with the resulting test suite containing roughly the same proportion of examples causing misprediction (i.e., failing test cases) as the operational dataset. However, while testing to estimate accuracy, it is desirable to also learn as much as possible from the failing tests in the operational dataset, since they inform about possible bugs of the DNN. A smart sampling strategy may allow to intentionally include in the test suite many examples causing misprediction, thus providing this way more valuable inputs for DNN improvement while preserving the ability to get trustworthy unbiased estimates. This paper presents a test selection technique (DeepEST) that actively looks for failing test cases in the operational dataset of a DNN, with the goal of assessing the DNN expected accuracy by a small and “informative” test suite (namely with a high number of mispredictions) for subsequent DNN improvement. Experiments with five subjects, combining four DNN models and three datasets, are described. The results show that DeepEST provides DNN accuracy estimates with precision close to (and often better than) those of existing sampling-based DNN testing techniques, while detecting from 5 to 30 times more mispredictions, with the same test suite size.
机译:深度神经网络(DNN)通常经过测试以依赖于一组未标记的真实世界数据(操作数据集),从中选择,从中选择,手动标记并用作测试套件。该子集需要小(由于手动标记成本)尚未忠实地代表操作环境,得到的测试套件包含大致相同的示例比例,导致错误公平(即,失败的测试用例)作为操作数据集。然而,在测试估计准确度的同时,希望从操作数据集中的故障测试中尽可能多地学习,因为它们会通知DNN可能的错误。智能采样策略可以允许故意包括在测试套件中的许多示例导致错误规定,从而为DNN改进提供了更有价值的输入,同时保留了获得值得信赖的无偏估计的能力。本文提出了一种测试选择技术(最深),积极寻找DNN的运营数据集中的测试用例,其目标是通过小型和“信息性”测试套件(即大数量)评估DNN预期准确性的目标误像性的是随后的DNN改进。描述了五个受试者的实验,结合四个DNN模型和三个数据集。结果表明,最深的是DNN精度估计,精度接近(通常优于基于采样的DNN测试技术的精度,同时检测到更短的错误预测5到30倍,具有相同的测试套件大小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号