Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions

Muzammil Shahbaz; Phil McMinn; Mark Stevenson

首页> 外文期刊>Science of Computer Programming >Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions

【24h】

Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions

机译：使用Web搜索和正则表达式为字符串验证例程自动生成有效和无效的测试数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classic approaches to automatic input data generation are usually driven by the goal of obtaining program coverage and the need to solve or find solutions to path constraints to achieve this. As inputs are generated with respect to the structure of the code, they can be ineffective, difficult for humans to read, and unsuitable for testing missing implementation. Furthermore, these approaches have known limitations when handling constraints that involve operations with string data types. This paper presents a novel approach for generating string test data for string validation routines, by harnessing the Internet. The technique uses program identifiers to construct web search queries for regular expressions that validate the format of a string type (such as an email address). It then performs further web searches for strings that match the regular expressions, producing examples of test cases that are both valid and realistic. Following this, our technique mutates the regular expressions to drive the search for invalid strings, and the production of test inputs that should be rejected by the validation routine. The paper presents the results of an empirical study evaluating our approach. The study was conducted on 24 string input validation routines collected from 10 open source projects. While dynamic symbolic execution and search-based testing approaches were only able to generate a very low number of values successfully, our approach generated values with an accuracy of 34% on average for the case of valid strings, and 99% on average for the case of invalid strings. Furthermore, whereas dynamic symbolic execution and search-based testing approaches were only capable of detecting faults in 8 routines, our approach detected faults in 17 out of the 19 validation routines known to contain implementation errors.

机译：自动输入数据生成的经典方法通常是由获得程序覆盖率的目标以及解决此问题或寻找路径约束解决方案的需求所驱动。由于输入是根据代码的结构生成的，因此输入可能无效，难以阅读，并且不适合测试缺少的实现。此外，这些方法在处理涉及字符串数据类型的操作的约束时具有已知的局限性。本文提出了一种利用互联网为字符串验证例程生成字符串测试数据的新颖方法。该技术使用程序标识符为正则表达式构造Web搜索查询，以验证字符串类型（例如电子邮件地址）的格式。然后，它将对匹配正则表达式的字符串进行进一步的Web搜索，从而生成有效且现实的测试用例示例。此后，我们的技术对正则表达式进行了变异，以驱动对无效字符串的搜索，并生成应被验证例程拒绝的测试输入。本文介绍了评估我们的方法的实证研究结果。该研究是对从10个开源项目中收集的24个字符串输入验证例程进行的。虽然动态符号执行和基于搜索的测试方法只能成功生成极少量的值，但对于有效字符串，我们的方法生成的值平均准确度为34％，对于情况为平均99％无效的字符串。此外，虽然动态符号执行和基于搜索的测试方法仅能够检测8个例程中的错误，但是我们的方法在已知包含实现错误的19个验证例程中检测了17个错误。

著录项

来源
《Science of Computer Programming》 |2015年第4期|405-425|共21页
作者
Muzammil Shahbaz; Phil McMinn; Mark Stevenson;
展开▼
作者单位

University of Sheffield, UK;

University of Sheffield, UK;

University of Sheffield, UK;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Test data generation; Web searches; Regular expressions;

机译：测试数据生成;网络搜索;常用表达;

相似文献

外文文献
中文文献
专利

1. String Generation for Testing Regular Expressions [J] . Lixiao Zheng, Shuai Ma, Yuanyang Wang, The Computer journal . 2020,第1期

机译：用于测试正则表达式的字符串生成
2. String Generation for Testing Regular Expressions [J] . Lixiao Zheng, Shuai Ma, Yuanyang Wang, The Computer Journal . 2020,第1期

机译：用于测试正则表达式的字符串生成
3. Use of multiple performance and symptom validity measures: Determining the optimal per test cutoff for determination of invalidity, analysis of skew, and inter-test correlations in valid and invalid performance groups [J] . Larrabee Glenn J., Rohling Martin L., Meyers John E. The Clinical neuropsychologist . 2019,第8期

机译：使用多种性能和症状有效性措施：确定每个测试截止的最佳截止，用于确定无效性，偏差分析以及有效和无效性能组中的测试间相关性
4. Automated Discovery of Valid Test Strings from the Web Using Dynamic Regular Expressions Collation and Natural Language Processing [C] . Shahbaz Muzammil, McMinn Phil, Stevenson Mark 12th International Conference on Quality Software. . 2012

机译：使用动态正则表达式排序规则和自然语言处理从Web自动发现有效的测试字符串
5. Testing Instrument Validity and Identification with Invalid Instruments [D] . Kedagni, Desire. 2018

机译：使用无效仪器测试仪器有效性和识别
6. ODM Data Analysis—A tool for the automatic validation, monitoring and generation of generic descriptive statistics of patient data [O] . Tobias Johannes Brix, Philipp Bruland, Saad Sarfraz, 2012

机译：ODM数据分析-一种自动验证，监视和生成患者数据的通用描述性统计信息的工具
7. Generation of String Test Input from Web using Regular Expression [O] . Sneha Shelke, Sangeeta Nagpure 2014

机译：使用正则表达式从Web的字符串测试输入
8. Generation of an Output Regular Expression of a Sequential Machine with a Specified Input Regular Expression [R] . Yau, S. S. 1966

机译：具有指定输入正则表达式的顺序机器的输出正则表达式的生成

Automatic generation of valid and invalid test data for string validation routines using web searches and regular expressions

摘要

著录项

相似文献

相关主题

期刊订阅