首页> 外文会议>2016 IEEE/ACM 38th IEEE International Conference on Software Engineering >CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features
【24h】

CUSTODES: Automatic Spreadsheet Cell Clustering and Smell Detection Using Strong and Weak Features

机译:客户:使用强大和弱功能的自动电子表格单元格聚类和气味检测

获取原文
获取原文并翻译 | 示例

摘要

Various techniques have been proposed to detect smells in spreadsheets, which are susceptible to errors. These techniques typically detect spreadsheet smells through a mechanism based on a fixed set of patterns or metric thresholds. Unlike conventional programs, tabulation styles vary greatly across spreadsheets. Smell detection based on fixed patterns or metric thresholds, which are insensitive to the varying tabulation styles, can miss many smells in one spreadsheet while reporting many spurious smells in another. In this paper, we propose CUSTODES to effectively cluster spreadsheet cells and detect smells in these clusters. The clustering mechanism can automatically adapt to the tabulation styles of each spreadsheet using strong and weak features. These strong and weak features capture the invariant and variant parts of tabulation styles, respectively. As smelly cells in a spreadsheet normally occur in minority, they can be mechanically detected as clusters' outliers in feature spaces. We implemented and applied CUSTODES to 70 spreadsheets files randomly sampled from the EUSES corpus. These spreadsheets contain 1,610 formula cell clusters. Experimental results confirmed that CUSTODES is effective. It successfully detected harmful smells that can induce computation anomalies in spreadsheets with an F-measure of 0.72, outperforming state-of-the-art techniques.
机译:已经提出了各种技术来检测电子表格中容易出错的气味。这些技术通常通过基于一组固定的模式或指标阈值的机制来检测电子表格的气味。与常规程序不同,制表样式在电子表格中差异很大。对固定的样式或度量标准阈值进行的气味检测对变化的制表样式不敏感,可以在一个电子表格中错过许多气味,而在另一电子表格中报告许多杂味。在本文中,我们提出CUSTODES,以有效地对电子表格单元进行聚类并检测这些聚类中的气味。聚类机制可以使用强项和弱项来自动适应每个电子表格的表格样式。这些强项和弱项分别捕获了制表样式的不变部分和变体部分。由于电子表格中的臭单元格通常以少数形式出现,因此可以将它们机械地检测为特征空间中的簇离群值。我们对EUSES语料库中随机抽取的70个电子表格文件实施了CUSTODES并将其应用。这些电子表格包含1,610个公式单元格群集。实验结果证实CUSTODES是有效的。它以0.72的F值成功检测出可能导致电子表格中的计算异常的有害气味,性能优于最新技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号