An Efficient String Matching Technique for Desktop Search to Detect Duplicate Files

S. Vijayarani; Ms. M.Muthulakshmi

首页> 外文期刊>International Journal of Information Technology and Computer Science >An Efficient String Matching Technique for Desktop Search to Detect Duplicate Files

【24h】

An Efficient String Matching Technique for Desktop Search to Detect Duplicate Files

机译：用于桌面搜索以检测重复文件的高效字符串匹配技术

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information retrieval is used to identify the relevant documents in a document collection, which is matching a user's query. It also refers to the automatic retrieval of documents from the large document corpus. The most important application of information retrieval system is search engine like Google, which identify those documents on the World Wide Web that are relevant to user queries. In most situations, users may download the files that are already downloaded and stored in their computer. Then, there is a chance of multiple copies of the files that are already stored in different drives and folders on the system, which in turn reduces the performance of the system and these files occupy a lot of memory space. Analyzing the contents of the file and finding their similarity is one of the major problems in text mining and information retrieval. The main objective of this research work is to analyze the file contents and deletes the duplicate files in the system. In order to perform this task, this research work proposes a new tool named Duplicate File Detector Tool i.e. DFDT. DFDT helps the user to search and delete duplicate files in the system at a minimum time. It also helps to delete the duplicate files not only with the same file category, but also with different file categories. Boyer Moore Horspool and Knuth Morris Pratt string searching algorithms are existing algorithms and these algorithms are used to compare the file contents for finding their similarity. This work also proposes a new string matching algorithm named as W2COM (Word to Word COMparison). From the experimental results it is observed that the newly proposed W2COM string matching algorithm performance is better than Boyer Moore Horspool and Knuth Morris Pratt algorithms.

机译：信息检索用于标识文档集中与用户查询匹配的相关文档。它还指从大型文档语料库中自动检索文档。信息检索系统最重要的应用是Google之类的搜索引擎，它可以识别万维网上与用户查询相关的那些文档。在大多数情况下，用户可以下载已经下载并存储在计算机中的文件。这样，就有可能存在已存储在系统上不同驱动器和文件夹中的文件的多个副本，从而降低了系统的性能，并且这些文件占用了大量内存空间。分析文件的内容并找到它们的相似性是文本挖掘和信息检索中的主要问题之一。这项研究工作的主要目的是分析文件内容并删除系统中的重复文件。为了执行此任务，这项研究工作提出了一个名为“重复文件检测器工具”的新工具，即DFDT。 DFDT帮助用户在最短的时间搜索和删除系统中的重复文件。它还不仅可以删除具有相同文件类别的重复文件，而且还可以删除具有不同文件类别的重复文件。 Boyer Moore Horspool和Knuth Morris Pratt字符串搜索算法是现有算法，这些算法用于比较文件内容以查找它们的相似性。这项工作还提出了一种新的字符串匹配算法，称为W2COM（Word to Word COMparison）。从实验结果可以看出，新提出的W2COM字符串匹配算法的性能优于Boyer Moore Horspool和Knuth Morris Pratt算法。

著录项

来源
《International Journal of Information Technology and Computer Science》 |2017年第7期|共8页
作者
S. Vijayarani; Ms. M.Muthulakshmi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. String techniques for detecting duplicates in document database [J] . Daniel P.Lopresti International Journal on Document Analysis and Recognition . 2000,第4期

机译：用于检测文档数据库中重复项的字符串技术
2. A New Efficient Hybrid String Matching Algorithm to Solve the Exact String Matching Problem [J] . Sinan Sameer Mahmood Al-Dabbagh, Nawaf Hazim Barnouti British Journal of Mathematics & Computer Science . 2016,第2期

机译：一种解决精确字符串匹配问题的高效混合字符串匹配算法
3. Dynamic programming-based dense stereo matching improvement using an efficient search space reduction technique [J] . Salehian Behzad, Fotouhi Ali M., Raie Abolghasem A. Optik: Zeitschrift fur Licht- und Elektronenoptik: = Journal for Light-and Electronoptic . 2018,第期

机译：基于动态编程的密集立体声匹配使用有效的搜索空间减少技术改进
4. Fast-Search: A New Efficient Variant of the Boyer-Moore String Matching Algorithm [C] . Domenico Cantone, Simone Faro Experimental and Efficient Algorithms . 2003

机译：快速搜索：Boyer-Moore字符串匹配算法的一种新型高效变体
5. New techniques for the design and implementation of efficient full -search algorithms for block -matching motion estimation. [D] . Yang, Chun. 2007

机译：用于块匹配运动估计的高效全搜索算法的设计和实现的新技术。
6. Fingerprints Recognition System-Based on Mobile Device Identification Using Circular String Pattern Matching Techniques [O] . Miznah H. Alshammary, Costas S. Iliopoulos, Mujibur R. Khan -1

机译：基于环形字符串匹配技术的移动设备识别的指纹识别系统
7. RomaDroid: A Robust and Efficient Technique for Detecting Android App Clones Using a Tree Structure and Components of Each App’s Manifest File [O] . Byoungchul Kim, Kyeonghwan Lim, Seong-Je Cho, 2019

机译：Romadroid：使用每个应用程序清单文件的树结构和组件来检测Android App Clones的强大有效的技术

An Efficient String Matching Technique for Desktop Search to Detect Duplicate Files

摘要

著录项

相似文献

相关主题

期刊订阅