The question database similarity detection is a test that can quickly in the huge, find the similarity is very high, which questions repeated also need screening. General use Excel API JAVA program with the distance editing algorithm, to achieve a direct access to the excel question bank. In designing a question bank repeated questions detection algorithm, we have found that based on the Levenshtein algorithm often appear memory overrun and unable to output, to improve exam similarity detection efficiency brings great negative effect. Through the actual operation of the study, using the string segmentation and increase the control statement can be very good to improve the problem, to improve the test efficiency is very favorable.
展开▼
机译:问题数据库的相似度检测是一项可以在巨大的范围内快速进行测试,发现相似度很高的问题,重复的问题也需要筛选。一般使用带有距离编辑算法的Excel API JAVA程序,来实现对excel问题库的直接访问。在设计题库重复题检测算法时,我们发现基于Levenshtein算法经常会出现内存溢出而无法输出的情况,这对提高考试相似度的检测效率带来了很大的负面影响。通过实际的学习研究,使用字符串分割和增加控制语句可以很好地改善问题,对提高测试效率非常有利。
展开▼