首页> 外文期刊>IEICE Transactions on Information and Systems >Detecting Partial and Near Duplication in the Blogosphere
【24h】

Detecting Partial and Near Duplication in the Blogosphere

机译:在Blogosphere中检测部分和接近重复

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, we propose a duplicate document detection model recognizing both partial duplicates and near duplicates. The proposed model can detect partial duplicates as well as exact duplicates by splitting a large document into many small sentence fingerprints. Furthermore, the proposed model can detect even near duplicates, the result of trivial revisions, by filtering the common words and reordering the word sequence.
机译:在本文中,我们提出了一种可识别部分重复和接近重复的重复文档检测模型。提出的模型可以通过将一个大文档分成许多小句子指纹来检测部分重复和精确重复。此外,通过过滤常用词并重新排列词序,所提出的模型甚至可以检测到几乎是重复的琐碎修订结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号