首页> 外文会议>IEEE International Conference on Big Data >Infectious texts: Modeling text reuse in nineteenth-century newspapers
【24h】

Infectious texts: Modeling text reuse in nineteenth-century newspapers

机译:传染性文本:在十九世纪报刊中建模文本重用

获取原文

摘要

Texts propagate through many social networks and provide evidence for their structure. We present efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to analyzing the culture of reprinting in the United States before the Civil War. Without substantial copyright enforcement, stories, poems, news, and anecdotes circulated freely among newspapers, magazines, and books. From a collection of OCR'd newspapers, we extract a new corpus of reprinted texts, explore the geographic spread and network connections of different publications, and analyze the time dynamics of different genres.
机译:文本通过许多社交网络传播,并为其结构提供证据。我们提出了高效的算法,用于检测嵌入在大集合中更长的文件中的重复使用的段落集群。我们采用这些技术来分析内战前在美国转载的文化。没有实质性的版权执法,故事,诗歌,新闻和轶事在报纸,杂志和书籍中自由分发。从一系列OCR'D报纸中,我们提取了一种新的转载文本语料库,探索不同出版物的地理扩展和网络连接,并分析不同类型的时间动态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号