Candidate document retrieval for Arabic-based text reuse detection on the web

机译：基于阿拉伯语的文本重用检测的候选文档检索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.

机译：给定输入文件D，本地文本重用检测的问题是从给定的文档集合中检测到D和其他文档之间的所有可能的重用段。将Document D的段落与集合中的每个其他文档的段落进行比较显然是不可行的，特别是诸如Web等大型集合。因此，选择可能包含重用文本的文档的子集成为检测问题的主要步骤。本文介绍了一种新的查询配方方法，可以从Web检索基于阿拉伯语的候选源文档。我们使用尤其为此作品的一系列文件进行评估。实验表明，平均而言，79.97％的重用案件中使用的Web文件被成功检索。

著录项

来源
《International Conference on Innovations in Information Technology》|2016年|223p|共6页
会议地点
作者
Leena Lulu; Boumediene Belkhouche; Saad Harous;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G20-53;
关键词
Engines; Web search; Plagiarism; Information technology; Search engines; Upper bound; Technological innovation;

机译：发动机;网页搜索;抄袭;信息技术;搜索引擎;上限;技术创新;

相似文献

外文文献
中文文献
专利

1. Ontology construction and concept reuse with formal concept analysis for improved web document retrieval [J] . W.C. Cho, D. Richards Web Intelligence and Agent Systems . 2007,第1期

机译：本体构建和概念重用以及形式化概念分析，可改善Web文档检索
2. Information Retrieval from Unstructured Web Text Document Based on Automatic Learning of the Threshold [J] . Fethi Fkih, Mohamed Nazih Omri International journal of information retrieval research . 2012,第4期

机译：基于阈值自动学习的非结构化Web文本文档信息检索
3. Candidate document retrieval for cross-lingual plagiarism detection using two-level proximity information [J] . Nava Ehsan, Azadeh Shakery Information Processing & Management . 2016,第6期

机译：使用两级邻近信息检索候选文档以进行跨语言retrieval窃
4. Candidate document retrieval for Arabic-based text reuse detection on the web [C] . Leena Lulu, Boumediene Belkhouche, Saad Harous International Conference on Innovations in Information Technology . 2016

机译：候选文档检索，用于在Web上基于阿拉伯语的文本重用检测
5. A method for reusing Web browsing experience to enhance Web information retrieval. [D] . Song, Guangfeng. 2003

机译：一种重用Web浏览体验以增强Web信息检索的方法。
6. Free-text medical document retrieval via phrase-based vector space model. [O] . Wenlei Mao, Wesley W. Chu 2002

机译：通过基于短语的向量空间模型检索自由文本医学文献。
7. Candidate Document Retrieval for Web-Scale Text Reuse Detection [O] . Matthias Hagen, Benno Stein 2011

机译：Web级文本重用检测的候选文档检索
8. Survey and description of candidate technologies to support single shell tank waste retrieval, leak detection, monitoring, and mitigation [R] . Lewis, R. E. , Teel, S. S. , Wegener, W. H. , 1995

机译：调查和描述支持单壳罐废物回收，泄漏检测，监测和缓解的候选技术

Candidate document retrieval for Arabic-based text reuse detection on the web

摘要

著录项

相似文献

相关主题

期刊订阅