【24h】

The American Local News Corpus

机译:美国地方新闻语料库

获取原文

摘要

We present the American Local News Corpus (ALNC), containing over 4 billion words of text from 2, 652 online newspapers in the United States. Each article in the corpus is associated with a timestamp, state, and city. All 50 U.S. states and 1, 924 cities are represented. We detail our method for taking daily snapshots of thousands of local and national newspapers and present two example corpus analyses. The first explores how different sports are talked about over time and geography. The second compares per capita murder rates with news coverage of murders across the 50 states. The ALNC is about the same size as the Gigaword corpus and is growing continuously. Version 1.0 is available for research use.
机译:我们介绍的是美国本地新闻语料库(ALNC),其中包含来自美国2 652家在线报纸的40亿多个单词。语料库中的每篇文章都与时间戳,州和城市相关联。美国有50个州和1,924个城市。我们详细介绍了每天获取数千份本地和国家报纸快照的方法,并提供了两个示例语料库分析。第一部分探讨了随着时间和地域如何谈论不同的运动。第二种将人均谋杀率与50个州的谋杀新闻报道进行了比较。 ALNC的大小与Gigaword语料库相同,并且还在不断增长。 1.0版可供研究使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号