In this work, we aim at identifying potential problems of ambiguity, completeness, conformity, singularity and readability in system and software requirements specifications. Those problems arise particularly when they are written in Natural Language. We describe them from linguistic point of view but the business impacts of each potential error will be considered in system engineering context where our corpus come from. Several standards give the criteria on writing good requirements to guide requirement authors. These properties are linguistically observable because they appear as lexical, syntactic, semantic and discursive problems in documents. We investigate error patterns heavily used, by analyzing manually the corpus. This analysis is based on the requirements grammar that we developed in this work. We then propose an approach to identify them automatically by applying the rules developed from the error patterns to the POS tagged and parsed corpus. By using error annotated corpus, we can train the error model using CRFs and evaluate it. We obtain overall 79.17% F_1 score for the error label annotation task.
展开▼