首页> 外文会议>String processing and information retrieval >When Was It Written? Automatically Determining Publication Dates
【24h】

When Was It Written? Automatically Determining Publication Dates

机译:什么时候写的?自动确定发布日期

获取原文
获取原文并翻译 | 示例

摘要

Automatically determining the publication date of a document is a complex task, since a document may contain only few intra-textual hints about its publication date. Yet, it has many important applications. Indeed, the amount of digitized historical documents is constantly increasing, but their publication dates are not always properly identified via OCR acquisition. Accurate knowledge about publication dates is crucial for many applications, e.g. studying the evolution of documents topics over a certain period of time.rnIn this article, we present a method for automatically determining the publication dates of documents, which was evaluated on a French newspaper corpus in the context of the DEFT 2011 evaluation campaign. Our system is based on a combination of different individual systems, relying both on supervised and unsupervised learning, and uses several external resources, e.g. Wikipedia, Google Books Ngrams, and etymological background knowledge about the French language. Our system detects the correct year of publication in 10% of the cases for 300-word excerpts and in 14% of the cases for 500-word excerpts, which is very promising given the complexity of the task.
机译:自动确定文档的发布日期是一项复杂的任务,因为文档可能仅包含很少的有关其发布日期的文本内提示。然而,它有许多重要的应用。确实,数字化历史文献的数量一直在增加,但是通过OCR采集并不能总是正确地确定其出版日期。有关发布日期的准确知识对于许多应用至关重要,例如在一段时间内研究文档主题的演变。在本文中,我们介绍了一种自动确定文档发布日期的方法,该方法是在DEFT 2011评估活动的背景下在法国报纸语料库上进行评估的。我们的系统基于不同的独立系统的组合,同时依赖于监督学习和非监督学习,并使用多种外部资源,例如Wikipedia,Google Books Ngrams和有关法语的词源背景知识。我们的系统在300字摘录的案例中有10%,在500字摘录的案例中,有14%发现正确的出版年份,鉴于任务的复杂性,这是非常有希望的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号