首页> 外文学位 >Sentiment drift and its effect on the classification of Web log posts.
【24h】

Sentiment drift and its effect on the classification of Web log posts.

机译:情绪漂移及其对Web日志帖子分类的影响。

获取原文
获取原文并翻译 | 示例

摘要

Sentiment classification separates a collection of opinionated text into two opposing classes: favorable and unfavorable. It has been successfully applied to online product comments and movie reviews. Previous studies have shown that topic, domain, and time influence the results of machine learning models used to classify sentiment. This thesis furthers the investigation of time on sentiment classification. It defines the phenomenon of sentiment drift: the change of sentiment over time. We create a topic-specific corpus and demonstrate a change in sentiment over specific time periods. The source of the corpus is web logs; we find it to be more difficult to classify than previous studied corpora.; Previous work has shown that factors such as machine learning induction technique, class composition, dataset size and feature selection all influence predictability. We show models with configurations that maximize predictability under these factors are still influenced by time. The most successful configuration we found is a collection of Naive Bayes models with applied feature selection and a balanced class composition. The collection on average, predicts the sentiment of a web log post 89.77% of the time.; We perform collections of sentiment classification experiments varying the difference (in months) between the testing and the training period calling it the testing-training difference (TTD). We show as the TTD increases the predictability of the sentiment model decreases. Models trained on months chronologically closer to the training month significantly produce higher accuracies. We also show models trained on future data significantly outperform models trained on past data. We investigate statistical subsets of the models and show that each subset is influenced by the TTD.; We show that models that incorporate the influence of time produce higher predictability. We find, for example, ensemble models that define a weight based on the TTD produce higher predicatibility than those that do not ([2.176, 5.092] alpha-level .05). The findings show 3-month ensembles outperform the 5-month ensembles ([.39 alpha-level .05]), indicating component models created more than three months from the testing examples decrease the results of an ensemble.
机译:情感分类将观点文本的集合分为两个相对的类别:有利和不利。它已成功应用于在线产品评论和电影评论。先前的研究表明,主题,领域和时间会影响用于对情感进行分类的机器学习模型的结果。本文对情感分类的时间进行了深入的研究。它定义了情绪漂移现象:情绪随时间的变化。我们创建一个特定主题的语料库,并演示在特定时间段内情绪的变化。语料库的来源是网络日志。我们发现比以前研究的语料库更难分类。先前的工作表明,机器学习归纳技术,类组成,数据集大小和特征选择等因素都会影响可预测性。我们展示了在这些因素下具有最大可预测性的配置模型仍受时间影响。我们发现最成功的配置是Naive Bayes模型的集合,这些模型具有应用的特征选择和平衡的类组成。平均而言,该集合预测了89.77%的时间发布Web日志的情绪。我们执行情绪分类实验,以改变测试与训练期间之间的差异(以月为单位),称其为测试-训练差异(TTD)。我们显示,随着TTD的增加,情感模型的可预测性降低。在时间上更接近训练月的月份训练的模型会产生更高的准确性。我们还显示了根据未来数据训练的模型明显优于根据过去数据训练的模型。我们调查了模型的统计子集,并表明每个子集都受到TTD的影响。我们表明,结合时间影响的模型可产生更高的可预测性。例如,我们发现,基于TTD定义权重的集成模型产生的可预测性要比不具有此模型的集成模型具有更高的可预测性([2.176,5.092] alpha-level .05)。研究结果表明,三个月的合奏优于五个月的合奏([.39 alpha-level .05]),表明从测试示例创建三个多月的组件模型降低了合奏的结果。

著录项

  • 作者

    Durant, Kathleen T.;

  • 作者单位

    Harvard University.;

  • 授予单位 Harvard University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 150 p.
  • 总页数 150
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号