A system and a computer program product are provided for evaluating question-answer pairs in an answer key by generating a predicted answer to a test question based on the answer key modification history for comparison matching against a generated answer that is generated in response to the test question, and then comparing the predicted answer and generated answer to determine an accuracy score match indication therebetween so as to present an indication that the answer key may have a problem if there is a match between the predicted answer and generated answer.
展开▼