SmokeOut: An Approach for Testing Clustering Implementations

机译：SmokeOut：一种测试群集实现的方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is a key Machine Learning technique, used in many high-stakes domains from medicine to self-driving cars. Many clustering algorithms have been proposed, and these algorithms have been implemented in many toolkits. Clustering users assume that clustering implementations are correct, reliable, and for a given algorithm, interchangeable. We challenge these assumptions. We introduce SmokeOut, an approach and tool that pits clustering implementations against each other (and against themselves) while controlling for algorithm and dataset, to find datasets where clustering outcomes differ when they shouldn't, and measure this difference. We ran SmokeOut on 7 clustering algorithms (3 deterministic and 4 nondeterministic) implemented in 7 widely-used toolkits, and run in a variety of scenarios on the Penn Machine Learning Benchmark (162 datasets). SmokeOut has revealed that clustering implementations are fragile: on a given input dataset and using a given clustering algorithm, clustering outcomes and accuracy vary widely between (1) successive runs of the same toolkit; (2) different input parameters for that tool; (3) different toolkits.

机译：集群是一项关键的机器学习技术，广泛用于从医学到自动驾驶汽车等高风险领域。已经提出了许多聚类算法，并且已经在许多工具包中实现了这些算法。集群用户认为集群实现是正确，可靠的，并且对于给定的算法而言是可互换的。我们挑战这些假设。我们介绍了SmokeOut，这是一种在控制算法和数据集的同时使聚类实现彼此（以及与自身）相互对立的方法和工具，以查找聚类结果在不应该聚类的情况下出现差异的数据集，并衡量这种差异。我们在7种广泛使用的工具包中实施的7种聚类算法（3种确定性和4种非确定性）上运行SmokeOut，并在Penn机器学习基准（162个数据集）上的各种场景中运行。 SmokeOut发现聚类的实现很脆弱：在给定的输入数据集上并使用给定的聚类算法，聚类结果和准确性在（1）同一工具包的连续运行之间差异很大；（2）该工具的不同输入参数；（3）不同的工具包。

著录项

来源
《2019 12th IEEE Conference on Software Testing, Validation and Verification》|2019年|473-480|共8页
会议地点 Xian(CN)
作者
Vincenzo Musco; Xin Yin; Iulian Neamtiu;
展开▼
作者单位

New Jersey Institute of Technology;

New Jersey Institute of Technology;

New Jersey Institute of Technology;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Clustering algorithms; Shape; Software reliability; Matlab; Machine learning; Machine learning algorithms;

机译：聚类算法;形状;软件可靠性; Matlab;机器学习;机器学习算法;;

相似文献

外文文献
中文文献
专利

1. Study protocol testing toolkit versus usual care for implementation of screening, brief intervention, referral to treatment in hospitals: a phased cluster randomized approach [J] . Addiction Science & Clinical Practice . 2018,第1期

机译：研究方案测试工具包与常规护理的比较，以进行筛查，短暂干预，转诊至医院：分阶段整群随机方法
2. Automated generation of coupled-cluster diagrams: Implementation in the multireference state-specific coupled-cluster approach with the complete-active-space reference - art. no. 024108 [J] . Lyakh DI, Ivanov VV, Adamowicz L The Journal of Chemical Physics . 2005,第2期

机译：耦合集群图的自动生成：在具有完全活动空间参考的多参考状态特定耦合集群方法中实现。没有。 024108
3. Automated generation of coupled-cluster diagrams:Implementation in the multireference state-specific coupled-cluster approach with the complete-active-space reference [J] . Dmitry I.Lyakh, Vladimir V.Ivanov, Ludwik Adamowicz The Journal of Chemical Physics . 2005,第2期

机译：耦合簇图的自动生成：采用完全活动空间参考的多参考状态特定耦合簇方法的实现
4. SmokeOut: An Approach for Testing Clustering Implementations [C] . Vincenzo Musco, Xin Yin, Iulian Neamtiu IEEE Conference on Software Testing, Validation and Verification . 2019

机译：SmokeOut：一种测试聚类实现的方法
5. Effectively Implementing an Online Homework and Testing Management System to Increase Student Achievement - A Student Tailored Pedagogical Approach [D] . Dawes, Dale M. 2016

机译：有效实施在线作业和测试管理系统，以提高学生成就 - 学生量身定制的教学方法
6. Study protocol testing toolkit versus usual care for implementation of screening brief intervention referral to treatment in hospitals: a phased cluster randomized approach [O] . Robin Newhouse, Michelle Janney, Anne Gilbert, 2018

机译：研究方案测试工具包与常规护理的比较以进行筛查短暂干预转诊至医院：分阶段整群随机方法
7. On a resampling approach for tests on the number of clusters with mixture model-based clustering of tissue samples [O] . McLachlan G.J., Khan N. 2004

机译：基于组织模型基于混合模型的聚类测试簇数的重采样方法
8. Coupled-Cluster Open-Shell Analytic Gradients: Implementation of the DirectProduct Decomposition Approach in Energy Gradient Calculations [R] . Gauss, J., Stanton, J. F., Bartlett, R. J. 1991

机译：耦合簇开壳分析梯度：能量梯度计算中Directproduct分解方法的实现

SmokeOut: An Approach for Testing Clustering Implementations

摘要

著录项

相似文献

相关主题

期刊订阅