Text-Based Document Similarity Matching Using Sdtext

机译：使用Sdtext的基于文本的文档相似度匹配

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Forensics examiners frequently try to identify duplicate files during an investigation. They might do so to identify known files of interest, or to allow more rapid review of documents that appear to be similar. Current forensic tools for detecting duplicate files operate over the low-level bits of the file, typically using hashing. While this can be a fast and effective method in many cases, it can fail due to differences in file format. We introduce sdtext, a tool developed to identify similar files based on their textual contents, which is robust to changes in format. We show that sdtext is far more accurate than existing tools in matching files that contain the same text in different formats.

机译：法医检查员经常在调查过程中尝试识别重复文件。他们可能这样做是为了识别感兴趣的已知文件，或允许对看起来相似的文件进行更快速的查看。当前用于检测重复文件的取证工具通常使用散列在文件的低级位上运行。尽管在许多情况下这是一种快速有效的方法，但由于文件格式的不同，它可能会失败。我们引入sdtext，这是一种用于根据相似文本内容识别相似文件的工具，该工具对格式更改具有鲁棒性。我们显示，在包含不同格式的相同文本的匹配文件中，sdtext比现有工具准确得多。

著录项

来源
《Hawaii International Conference on System Sciences》|2016年|5607-5616|共10页
会议地点 Kauai, HI(US)
作者
Clay Shields;
展开▼
作者单位

Dept. of Comput. Sci. Georgetown Univ. Washington DC USA;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
duplicate document; forensics;

机译：副本文件；法证;

相似文献

外文文献
中文文献
专利

1. A multi-level matching method with hybrid similarity for document retrieval [J] . Haijun Zhang, Tommy W.S. Chow Expert Systems with Application . 2012,第3期

机译：一种具有混合相似度的多级匹配方法
2. A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications [J] . Elisa Bertino, Giovanna Guerrini, Marco Mesiti Information Systems . 2004,第1期

机译：一种用于度量XML文档和DTD之间的结构相似性的匹配算法及其应用
3. Implementation of Winnowing Algorithm Based K-Gram to Identify Plagiarism on File Text-Based Document [J] . Yanuar Nurdiansyah, Fiqih Nur Muharrom, Firdaus MATEC Web of Conferences . 2018,第1期

机译：基于K-Gram的Winnowing算法识别基于文本文件的抄袭
4. Text-Based Document Similarity Matching Using Sdtext [C] . Clay Shields Hawaii International Conference on System Sciences . 2016

机译：使用SDText的基于文本的文档相似性匹配
5. Evaluation of text-based and image-based representations for moving image documents. [D] . Goodrum, Abby Ann. 1997

机译：评估运动图像文档的基于文本和基于图像的表示形式。
6. Text-based similarity searching for hit- and lead-candidate identification [O] . Volker Hähnke 2012

机译：基于文本的相似性搜索来查找候选和潜在候选人
7. A Matching Algorithm for Measuring the Structural Similarity Between an XML Document and a DTD. [O] . BERTINO E, G. GUERRINI, MESITI M 2004

机译：一种用于测量XML文档和DTD之间的结构相似性的匹配算法。

Text-Based Document Similarity Matching Using Sdtext

摘要

著录项

相似文献

相关主题

期刊订阅