首页> 外文期刊>Decision support systems >On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data
【24h】

On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data

机译:在开发用于偏好分析的可靠模型时:模型选择,功能集和不平衡数据

获取原文
获取原文并翻译 | 示例
       

摘要

Locating documents carrying positive or negative favourability is an important application within media analysis. This article presents some empirical results on the challenges facing a machine-learning approach to this kind of opinion mining. Some of the challenges include the often considerable imbalance in the distribution of positive and negative samples, changes in the documents over time, and effective training and evaluation procedures for the models. This article presents results on three data sets generated by a media-analysis company, classifying documents in two ways: detecting the presence of favourability, and assessing negative vs. positive favourability. We describe our experiments in developing a machine-learning approach to automate the classification process. We explore the effect of using five different types of features, the robustness of the models when tested on data taken from a later time period, and the effect of balancing the input data by undersampling. We find varying choices for the optimum classifier, feature set and training strategy depending on the task and data set.
机译:查找具有正面或负面偏爱的文档是媒体分析中的重要应用。本文针对这种观点挖掘的机器学习方法所面临的挑战提供了一些实证结果。一些挑战包括正负样品的分配经常不平衡,文档随时间变化以及模型的有效培训和评估程序。本文介绍了一家媒体分析公司生成的三个数据集的结果,这些文档以两种方式对文档进行分类:检测是否存在有利性,以及评估负面与正面有利性。我们在开发一种机器学习方法以自动化分类过程的过程中描述了我们的实验。我们探讨了使用五种不同类型的功能的效果,在较晚时间段对数据进行测试时模型的健壮性以及通过欠采样来平衡输入数据的效果。我们根据任务和数据集找到最佳分类器,功能集和训练策略的不同选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号