...
首页> 外文期刊>Journal of the American Society for Information Science and Technology >A Quantitative Analysis of the Temporal Effects on Automatic Text Classification
【24h】

A Quantitative Analysis of the Temporal Effects on Automatic Text Classification

机译:文本自动分类的时间效应定量分析

获取原文
获取原文并翻译 | 示例
           

摘要

Automatic text classification (TC) continues to be a relevant research topic and several TC algorithms have been proposed. However, the majority of TC algorithms assume that the underlying data distribution does not change over time. In this work, we are concerned with the challenges imposed by the temporal dynamics observed in textual data sets. We provide evidence of the existence of temporal effects in three textual data sets, reflected by variations observed over time in the class distribution, in the pairwise class similarities, and in the relationships between terms and classes. We then quantify, using a series of full factorial design experiments, the impact of these effects on four well-known TC algorithms. We show that these temporal effects affect each analyzed data set differently and that they restrict the performance of each considered TC algorithm to different extents. The reported quantitative analyses, which are the original contributions of this article, provide valuable new insights to better understand the behavior of TC algorithms when faced with nonstatic (temporal) data distributions and highlight important requirements for the proposal of more accurate classification models.
机译:自动文本分类(TC)仍然是一个相关的研究主题,并且已经提出了几种TC算法。但是,大多数TC算法都假定基础数据分布不会随时间变化。在这项工作中,我们关注文本数据集中观察到的时间动态所带来的挑战。我们提供了三个文本数据集中存在时间效应的证据,这反映在班级分布,成对的班级相似度以及术语和班级之间的关系随时间的变化而反映出来的情况下。然后,我们使用一系列全因子设计实验来量化这些影响对四种著名TC算法的影响。我们证明这些时间效应对每个分析数据集的影响不同,并且它们在不同程度上限制了每个考虑的TC算法的性能。所报告的定量分析是本文的原始贡献,它们提供了宝贵的新见解,可以更好地理解TC算法在面对非静态(时间)数据分布时的行为,并突出了提出更精确分类模型的重要要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号