Have You Forgotten? A Method to Assess if Machine Learning Models Have Forgotten Data

机译：你忘记了吗？评估机器学习模型是否忘记数据的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the era of deep learning, aggregation of data from several sources is a common approach to ensuring data diversity. Let us consider a scenario where several providers contribute data to a consortium for the joint development of a classification model (hereafter the target model), but, now one of the providers decides to leave. This provider requests that their data (hereafter the query dataset) be removed from the databases but also that the model 'forgets' their data. In this paper, for the first time, we want to address the challenging question of whether data have been forgotten by a model. We assume knowledge of the query dataset and the distribution of a model's output. We establish statistical methods that compare the target's outputs with outputs of models trained with different datasets. We evaluate our approach on several benchmark datasets (MNIST, CIFAR-10 and SVHN) and on a cardiac pathology diagnosis task using data from the Automated Cardiac Diagnosis Challenge (ACDC). We hope to encourage studies on what information a model retains and inspire extensions in more complex settings.

机译：在深度学习的时代，来自几个来源的数据的聚合是确保数据分集的常见方法。让我们考虑一个场景，其中几个提供商为联盟提供数据以进行分类模型的联合开发（以下目标模型），但现在其中一个提供者决定离开。该提供商请求从数据库中删除其数据（以下，查询数据集），但模型“忘记”其数据。在本文中，我们首次想要解决数据是否被模型忘记的具有挑战性的问题。我们假设查询数据集的知识和模型输出的分布。建立统计方法，比较目标的输出与使用不同数据集培训的模型的输出。我们在几个基准数据集（MNIST，CIFAR-10和SVHN）上和使用来自自动心脏诊断挑战（ACDC）的数据的心脏病理诊断任务进行评估。我们希望鼓励研究模型在更复杂的设置中保留和激发扩展的信息的研究。

著录项

来源
《International Conference on Medical Image Computing and Computer-Assisted Intervention》|2020年|95-105|共11页
会议地点
作者
Xiao Liu; Sotirios A. Tsaftaris;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Privacy; Statistical measure; Kolmogorov-Smirnov;

机译：隐私;统计措施;Kolmogorov-Smirnov.;
入库时间 2022-08-26 13:53:54

相似文献

外文文献
中文文献
专利

1. Assessing the Applicability of Random Forest, Stochastic Gradient Boosted Model, and Extreme Learning Machine Methods to the Quantitative Precipitation Estimation of the Radar Data: A Case Study to Gwangdeoksan Radar, South Korea, in 2018 [J] . Ju-Young Shin, Yonghun Ro, Joo-Wan Cha, Advances in Meteorology . 2019,第2期

机译：评估随机森林，随机梯度提升模型和极端学习机方法对雷达数据定量降水估计的适用性：以韩国光德山雷达的案例研究，2018年
2. Best (but oft-forgotten) practices: missing data methods in randomized controlled nutrition trials [J] . Li Peng, Stuart Elizabeth A. The American Journal of Clinical Nutrition: Official Journal of the American Society for Clinical Nutrition . 2019,第3期

机译：最好的（但是OFT-REGORTEN）做法：随机控制营养试验中缺少数据方法
3. Retrieval of subpixel Tamarix canopy cover from Landsat data along the Forgotten River using linear and nonlinear spectral mixture models [J] . Silván-Cárdenas J.L., Wang L. Remote Sensing of Environment: An Interdisciplinary Journal . 2010,第8期

机译：使用线性和非线性光谱混合模型从沿遗忘河的Landsat数据中提取亚像素Tamarix冠层覆盖
4. Forgotten Siblings: Unifying Attacks on Machine Learning and Digital Watermarking [C] . Erwin Quiring, Daniel Arp, Konrad Rieck IEEE European Symposium on Security and Privacy . 2018

机译：被遗忘的兄弟姐妹：统一攻击机器学习和数字水印
5. The Use of Machine Learning Method for Modeling and Analyzing Pedestrian Crash Data and Comparisons with Traditional Discrete Choice Methods [D] . Li, Yang. 2020

机译：使用机器学习方法来建模和分析行人碰撞数据和传统离散选择方法的比较
6. Limitations of methods of describing populations. Inaccuracies in data must not be forgotten. [O] . D. P. Fitton 1994

机译：描述种群的方法的局限性。数据的准确性绝对不能忘记。
7. Misbegotten Methodologies and Forgotten Lessons from Tom Swift’s Electric Factor Analysis Machine: A Demonstration with Competing Structural Models of Psychopathology [O] . Ashley Lauren Greene, Ashley L. Watts, Miriam K. Forbes, 2021

机译：Tom Swift的电动因子分析机的判断方法和遗忘的课程：具有竞争性能的心理病理学结构模型的示范

Have You Forgotten? A Method to Assess if Machine Learning Models Have Forgotten Data

摘要

著录项

相似文献

相关主题

期刊订阅