首页> 外文会议>International Conference on Innovations in Information Technology >Candidate document retrieval for Arabic-based text reuse detection on the web
【24h】

Candidate document retrieval for Arabic-based text reuse detection on the web

机译:候选文档检索,用于在Web上基于阿拉伯语的文本重用检测

获取原文

摘要

Given an input document d, the problem of local text reuse detection is to detect from a given documents collection, all the possible reused passages between d and the other documents. Comparing the passages of document d with the passages of every other document in the collection is obviously infeasible especially with large collections such as the Web. Therefore, selecting a subset of the documents that potentially contains reused text with d becomes a major step in the detection problem. This paper describes a new efficient approach of query formulation to retrieve Arabic-based candidate source documents from the Web. We evaluated the work using a collection of documents especially constructed for this work. The experiments show that on average, 79.97% of the Web documents used in the reused cases were successfully retrieved.
机译:给定输入文档d,本地文本重用检测的问题是从给定的文档集合中检测d和其他文档之间所有可能的重用段落。将文件d的段落与集合中所有其他文件的段落进行比较显然是不可行的,尤其是对于Web等大型集合而言。因此,选择可能包含带有d的重用文本的文档子集成为检测问题中的主要步骤。本文介绍了一种新的高效查询公式化方法,可从Web检索基于阿拉伯语的候选源文档。我们使用了专门为此工作构建的文档集合来评估该工作。实验表明,平均而言,重用案例中使用的Web文档平均达到了79.97%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号