首页> 外文会议>International Conference on Parallel and Distributed Information Systems >dSCAM: finding document copies across multiple databases
【24h】

dSCAM: finding document copies across multiple databases

机译:DSCAM:在多个数据库中查找文档副本

获取原文

摘要

The advent of the Internet has made the illegal dissemination of copyrighted material easy. An important problem is how to automatically detect when a "new" digital document is "suspiciously close" to existing ones. The SCAM project at Stanford University has addressed this problem when there is a single registered-document database. However, in practice, test documents may appear in many autonomous databases, and one would like to discover copies without having to exhaustively search in all databases. The authors' approach, dSCAM, is a distributed version of SCAM that keeps succinct metainformation about the contents of the available document databases. Given a suspicious document S, dSCAM uses its information to prune all databases that cannot contain any document that is close enough to S, and hence the search can focus on the remaining sites. They also study how to query the remaining databases so as to minimize different querying costs. They empirically study the pruning and searching schemes, using a collection of 50 databases and two sets of test documents.
机译:互联网的出现使得不易传播受版权保护的材料。重要问题是如何自动检测“新”数字文档与现有的“可疑地关闭”。当有一个注册文档数据库时,斯坦福大学的骗局项目已经解决了这个问题。但是,在实践中,测试文档可能出现在许多自主数据库中,并且想要发现副本,而无需彻底搜索所有数据库。作者的方法DSCAM是骗局的分布式版本,它会对可用文档数据库的内容保持简洁的Metainformation。鉴于可疑文档S,DSCAM使用其信息来修剪不能包含任何接近S的文档的所有数据库,因此搜索可以专注于剩余站点。他们还研究了如何查询剩余的数据库,以便最小化不同的查询成本。他们经验使用50个数据库和两组测试文档的集合来凭经验研究修剪和搜索方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号