首页> 外文期刊>Expert Systems with Application >A solution to reconstruct cross-cut shredded text documents based on constrained seed K-means algorithm and ant colony algorithm
【24h】

A solution to reconstruct cross-cut shredded text documents based on constrained seed K-means algorithm and ant colony algorithm

机译:基于约束种子K-均值算法和蚁群算法的横切文本文档重构解决方案

获取原文
获取原文并翻译 | 示例

摘要

The reconstruction of cross-cut shredded text documents (RCCSTD) is an important problem in forensics and is a real, complex and notable issue for information security and judicial investigations. It can be considered a special kind of greedy square jigsaw puzzle and has attracted the attention of many researchers. Clustering fragments into several rows is a crucial and difficult step in RCCSTD. However, existing approaches achieve low clustering accuracy. This paper therefore proposes a new clustering algorithm based on horizontal projection and a constrained seed K-means algorithm to improve the clustering accuracy. The constrained seed K-means algorithm draws upon expert knowledge and has the following characteristics: 1) the first fragment in each row is easy to distinguish and the unidimensional signals that are extracted from the first fragment can be used as the initial clustering center: 2) two or more prior fragments cannot be clustered together. To improve the splicing accuracy in the rows, a penalty coefficient is added to a traditional cost function. Experiments were carried out on 10 text documents. The accuracy of the clustering algorithm was 99.1% and the overall splicing accuracy was 91.0%, according to our measurements. The algorithm was compared with two other approaches and was found to offer significantly improved performance in terms of clustering accuracy. Our approach obtained the best results of RCCSTD problem based on our experiment results. Moreover, a more complex and real problem - reconstruction of cross-cut shredded dual text documents (RCCSDTD) problem - was tried to solve. The satisfactory results for RCCSDTD problems in some cases were obtained, to authors' best knowledge, our method is the first feasible approach for RCCSDTD problem. On the other hand, the developed system is fundamentally an expert system that is being specifically applied to solve RCCSTD problems. (C) 2019 Elsevier Ltd. All rights reserved.
机译:交叉切割的文本文档的重建(RCCSTD)是司法鉴定中的一个重要问题,对于信息安全和司法调查而言,这是一个现实,复杂且值得注意的问题。它可以被认为是一种特殊的贪婪方形拼图游戏,吸引了许多研究人员的注意。将片段聚簇成几行是RCCSTD中至关重要且困难的步骤。但是,现有方法实现了较低的聚类精度。因此,本文提出了一种基于水平投影的新聚类算法和一种受约束的种子K均值算法,以提高聚类的准确性。约束种子K均值算法基于专家知识,具有以下特征:1)每行中的第一个片段易于区分,并且从第一个片段中提取的一维信号可以用作初始聚类中心:2 )不能将两个或多个先前的片段聚在一起。为了提高行中的拼接精度,将惩罚系数添加到传统成本函数中。在10个文本文件上进行了实验。根据我们的测量,聚类算法的准确性为99.1%,整体剪接准确性为91.0%。该算法与其他两种方法进行了比较,发现在聚类精度方面可以显着提高性能。根据我们的实验结果,我们的方法获得了RCCSTD问题的最佳结果。此外,试图解决一个更复杂和实际的问题-重建横切碎文本文档(RCCSDTD)问题。在某些情况下,对于RCCSDTD问题取得了令人满意的结果,据作者所知,我们的方法是解决RCCSDTD问题的第一种可行方法。另一方面,从根本上说,开发的系统是专门用于解决RCCSTD问题的专家系统。 (C)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号