首页> 外文会议>International Conference on Tools with Artificial Intelligence >Detecting and Describing Historical Periods in a Large Corpora
【24h】

Detecting and Describing Historical Periods in a Large Corpora

机译:在大型语料库中检测和描述历史时期

获取原文

摘要

Many historic periods (or events) are remembered by slogans, expressions or words that are strongly linked to them. Educated people are also able to determine whether a particular word or expression is related to a specific period in human history. The present paper aims to establish correlations between significant historic periods (or events) and the texts written in that period. In order to achieve this, we have developed a system that automatically links words (and topics discovered using Latent Dirichlet Allocation) to periods of time in the recent history. For this analysis to be relevant and conclusive, it must be undertaken on a representative set of texts written throughout history. To this end, instead of relying on manually selected texts, the Google Books Ngram corpus has been chosen as a basis for the analysis. Although it provides only word n-gram statistics for the texts written in a given year, the resulting time series can be used to provide insights about the most important periods and events in recent history, by automatically linking them with specific keywords or even LDA topics.
机译:与它们紧密相关的口号,表达方式或词语会记住许多历史时期(或事件)。受过良好教育的人还能够确定特定单词或表达是否与人类历史上的特定时期相关。本文旨在建立重要的历史时期(或事件)与该时期撰写的文字之间的相关性。为了实现这一目标,我们开发了一种系统,该系统可以自动将单词(以及使用潜在狄利克雷分配发现的主题)链接到最近历史中的一段时间。为了使这种分析具有针对性和决定性,必须对整个历史过程中的代表性文本进行分析。为此,我们选择了Google Books Ngram语料库作为分析的基础,而不是依靠手动选择的文本。尽管它仅提供给定年份的文本的n-gram统计数据,但是通过自动将它们与特定的关键字甚至LDA主题链接起来,所得的时间序列可用于提供有关最近历史中最重要的时期和事件的见解。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号