【24h】

Linguistically informed digital fingerprints for text

机译:文本的语言上通知数字指纹

获取原文

摘要

Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.
机译:数字指纹识别,水印和跟踪技术在近年来获得重要性,以应对数字版权侵权等日益增长的问题。虽然可以以许多不同的方式产生指纹和水印,但到目前为止,使用自然语言处理的使用已经有限。测量自动版权侵权检测的文学作品的相似性需要识别和比较文档中内容的创新表达。在本文中,我们提出了一种基于其含量表达的自动指纹小说的语言方法。我们使用自然语言处理技术来生成“表达指纹”。这些指纹包括语言的句法和语义元素,即表达的语法和语义元素。我们的实验表明,表达的句法和语义要素能够准确地识别小说及其释义,提供了对自动复制识别的文本分类文献中使用的技术的显着改进。我们表明,这些表达的元素可用于指纹,标签或水印工作;它们代表了对作品特征至关重要的功能,即使在作品被释放时,工作中仍然相当一致。这些功能可以直接从工作的内容中提取,并且可以用于识别在没有预先存在的标签或逐字复制的检测器的情况下无法正确识别的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号