Cats Are Not Fish: Deep Learning Testing Calls for Out-Of-Distribution Awareness

机译：猫不是鱼：深入学习测试呼吁分发的意识

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

As Deep Learning (DL) is continuously adopted in many industrial applications, its quality and reliability start to raise concerns. Similar to the traditional software development process, testing the DL software to uncover its defects at an early stage is an effective way to reduce risks after deployment. According to the fundamental assumption of deep learning, the DL software does not provide statistical guarantee and has limited capability in handling data that falls outside of its learned distribution, i.e., out-of-distribution (OOD) data. Although recent progress has been made in designing novel testing techniques for DL software, which can detect thousands of errors, the current state-of-the-art DL testing techniques usually do not take the distribution of generated test data into consideration. It is therefore hard to judge whether the “identified errors” are indeed meaningful errors to the DL application (i.e., due to quality issues of the model) or outliers that cannot be handled by the current model (i.e., due to the lack of training data). Tofill this gap, we take thefi rst step and conduct a large scale empirical study, with a total of 451 experiment configurations, 42 deep neural networks (DNNs) and 1.2 million test data instances, to investigate and characterize the impact of OOD-awareness on DL testing. We further analyze the consequences when DL systems go into production by evaluating the effectiveness of adversarial retraining with distribution-aware errors. The results confirm that introducing data distribution awareness in both testing and enhancement phases outperforms distribution unaware retraining by up to 21.5%.

机译：由于深度学习（DL）在许多工业应用中不断采用，其质量和可靠性开始提高担忧。类似于传统的软件开发过程，测试DL软件在早期阶段发现其缺陷是减少部署后的风险的有效方法。根据深度学习的根本假设，DL软件没有提供统计保障，并在处理其所学习分配之外的数据，即分销（OOD）数据外的数据具有有限的能力。尽管在设计DL软件的新型测试技术方面取得了最近的进展，但是可以检测数千次错误，目前的最先进的DL测试技术通常不会考虑所产生的测试数据的分布。因此，难以判断“识别的错误”是否对DL应用程序的错误是有意义的错误（即，由于模型的质量问题）或无法由当前模型处理的异常值（即，由于缺乏培训数据）。 Tofill这个差距，我们采取了Fi RST步骤并进行了大规模的实证研究，共有451个实验配置，42个深度神经网络（DNN）和120万个测试数据实例，调查和表征ood认识的影响DL测试。当DL Systems通过评估对分布感知错误的逆转刷新的有效性时，我们进一步分析了后果。结果证实，在测试和增强阶段引入数据分布意识优越，不知不行高达21.5％。

著录项

来源
《IEEE/ACM International Conference on Automated Software Engineering》|2020年|1041-1052|共12页
会议地点
作者
David Berend; Xiaofei Xie; Lei Ma; Lingjun Zhou; Yang Liu; Chi Xu; Jianjun Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Deep learning; Training data; Software; Data models; Software reliability; Testing; Software engineering;

机译：深入学习;培训数据;软件;数据模型;软件可靠性;测试;软件工程;

相似文献

外文文献
中文文献
专利

1. Deep learning models for the prediction of small-scale fisheries catches: finfish fishery in the region of 'Bahia Magadalena-Almejas' [J] . Cavieses Nunez Ricardo Alberto, Ruiz de la Pena Miguel Angel Ojeda, Flores Irigollen Alfredo, ICES Journal of Marine Science . 2018,第6期

机译：预测小型渔业产量的深度学习模型：“巴伊亚州马加达莱纳-阿尔梅贾斯”地区的有鳍渔业
2. Towards Self-Optimizing Network: Applying Deep Learning to Network Traffic Categorization and Identification in the Context of Application-Aware Network [J] . PONGSAKORN U-CHUPALA, YASUHIRO WATASHIBA, KOHEI ICHIKAWA, 電子情報通信学会技術研究報告. 技術と社会·倫理. Social Implications of Technology and Information Ethics . 2017,第471期

机译：朝来自我优化网络：在应用程序感知网络上下文中应用深度学习与网络流量分类和标识
3. Towards Self-Optimizing Network: Applying Deep Learning to Network Traffic Categorization and Identification in the Context of Application-Aware Network [J] . PONGSAKORN U-CHUPALA, YASUHIRO WATASHIBA, KOHEI ICHIKAWA, 電子情報通信学会技術研究報告. インターネットアーキテクチャ. Internet Architecture . 2017,第472期

机译：朝来自我优化网络：在应用程序感知网络上下文中应用深度学习与网络流量分类和标识
4. Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings [C] . John Mitros, Arjun Pakrashi, Brian Mac Namee European conference on computer vision . 2020

机译：贝叶斯深度学习近似近似推断的后果和分销外设置
5. Ethnographically-informed design, development and testing of a mobile media collection and editing application for installation and maintenance of non-linear context-aware prompts in warehouse environments. [D] . Fry, Rachel A. 2017

机译：根据人种学信息设计，开发和测试移动媒体收集和编辑应用程序，以便在仓库环境中安装和维护非线性上下文相关提示。
6. If a fish can pass the mark test what are the implications for consciousness and self-awareness testing in animals? [O] . Masanori Kohda, Takashi Hotta, Tomohiro Takeyama, 2019

机译：如果一条鱼可以通过标记测试那么对动物的意识和自我意识测试有何意义？
7. Ramifications of Approximate Posterior Inference for Bayesian Deep Learning in Adversarial and Out-of-Distribution Settings [O] . John Mitros, Arjun Pakrashi, Brian Mac Namee 2020

机译：贝叶斯深度学习近似近似推断的后果和分发外设置

Cats Are Not Fish: Deep Learning Testing Calls for Out-Of-Distribution Awareness

摘要

著录项

相似文献

相关主题

期刊订阅