【24h】

Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?

机译:基于困惑和聚类识别时间趋势:我们正在寻找语言变化吗?

获取原文

摘要

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.
机译:在这项工作中,我们提出了一种数据驱动的方法来识别中世纪宪章中的时间趋势。我们将源自RNN的困惑用作文档之间的距离度量,然后对这些距离进行聚类。我们认为,这种语言模型计算出的困惑代表了时间趋势。使用K-Means算法产生的聚类至少在一定程度上归因于语言变化,从而洞悉了不同时间段的语言差异。我们建议,与离散仓相比,各个群集的时间分布可能提供更细微的时间趋势图,从而在分类任务中使用时可以提供更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号