首页> 外文会议>2016 IEEE/ACM 38th IEEE International Conference on Software Engineering Companion >VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis
【24h】

VEnron: A Versioned Spreadsheet Corpus and Related Evolution Analysis

机译:VEnron:版本化的电子表格语料库和相关的演化分析

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Like most conventional software, spreadsheets are subject to software evolution. However, spreadsheet evolution is rarely assisted by version management tools. As a result, the version information across evolved spreadsheets is often missing or highly fragmented. This makes it difficult for users to notice the evolution issues arising from their spreadsheets. In this paper, we propose a semi-automated approach that leverages spreadsheets' contexts (e.g., attached emails) and contents to identify evolved spreadsheets and recover the embedded version information. We apply it to the released email archive of the Enron Corporation and build an industrial-scale, versioned spreadsheet corpus VEnron. Our approach first clusters spreadsheets that likely evolved from one to another into evolution groups based on various fragmented information, such as spreadsheet filenames, spreadsheet contents, and spreadsheet-attached emails. Then, it recovers the version information of the spreadsheets in each evolution group. VEnron enables us to identify interesting issues that can arise from spreadsheet evolution. For example, the versioned spreadsheets popularly exist in the Enron email archive; changes in formulas are common; and some groups (16.9%) can introduce new errors during evolution. According to our knowledge, VEnron is the first spreadsheet corpus with version information. It provides a valuable resource to understand issues arising from spreadsheet evolution.
机译:像大多数常规软件一样,电子表格也需要进行软件开发。但是,电子表格的发展很少得到版本管理工具的帮助。结果,跨电子表格的版本信息通常会丢失或高度分散。这使用户很难注意到他们的电子表格产生的演变问题。在本文中,我们提出了一种半自动化的方法,该方法利用电子表格的上下文(例如,附加的电子邮件)和内容来识别演化的电子表格并恢复嵌入式版本信息。我们将其应用于Enron Corporation的已发布电子邮件归档中,并构建了工业规模的版本化电子表格语料库VEnron。我们的方法首先基于各种分散的信息(例如,电子表格文件名,电子表格内容和附有电子表格的电子邮件),将可能从一个电子表格演变为另一个电子表格的电子表格聚集成演化组。然后,它恢复每个演进组中电子表格的版本信息。 VEnron使我们能够识别出电子表格发展中可能引起的有趣问题。例如,安然电子邮件存档中普遍存在版本控制电子表格;公式更改很常见;有些群体(16.9%)会在进化过程中引入新的错误。据我们了解,VEnron是第一个包含版本信息的电子表格语料库。它提供了宝贵的资源来了解电子表格演变引起的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号