【24h】

Linguistically Informed Digital Fingerprints for Text

机译:语言学上的文本数字指纹

获取原文
获取原文并翻译 | 示例

摘要

Digital fingerprinting, watermarking, and tracking technologies have gained importance in the recent years in response to growing problems such as digital copyright infringement. While fingerprints and watermarks can be generated in many different ways, use of natural language processing for these purposes has so far been limited. Measuring similarity of literary works for automatic copyright infringement detection requires identifying and comparing creative expression of content in documents. In this paper, we present a linguistic approach to automatically fingerprinting novels based on their expression of content. We use natural language processing techniques to generate "expression fingerprints". These fingerprints consist of both syntactic and semantic elements of language, i.e., syntactic and semantic elements of expression. Our experiments indicate that syntactic and semantic elements of expression enable accurate identification of novels and their paraphrases, providing a significant improvement over techniques used in text classification literature for automatic copy recognition. We show that these elements of expression can be used to fingerprint, label, or watermark works; they represent features that are essential to the character of works and that remain fairly consistent in the works even when works are paraphrased. These features can be directly extracted from the contents of the works on demand and can be used to recognize works that would not be correctly identified either in the absence of pre-existing labels or by verbatim-copy detectors.
机译:近年来,数字指纹,水印和跟踪技术已变得越来越重要,以应对诸如数字版权侵权等日益严重的问题。尽管可以通过许多不同方式生成指纹和水印,但迄今为止,出于这些目的使用自然语言处理一直受到限制。衡量文学作品的相似性以进行自动版权侵权检测需要识别和比较文档中内容的创造性表达。在本文中,我们提出了一种基于小说内容表达自动识别指纹的语言方法。我们使用自然语言处理技术来生成“表情指纹”。这些指纹包括语言的句法和语义元素,即表达的句法和语义元素。我们的实验表明,表达的句法和语义元素能够准确识别小说及其释义,与文本分类文献中用于自动复制识别的技术相比,具有明显的改进。我们证明了这些表达元素可以用于指纹,标签或水印作品;它们代表了对作品的性格至关重要的特征,即使对作品进行了释义,这些特征在作品中也保持相当一致。这些特征可以直接从按需作品的内容中提取,并且可以用来识别在没有预先存在的标签或逐字复制检测器的情况下无法正确识别的作品。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号