首页> 外文会议>Conference on empirical methods in natural language processing >More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets
【24h】

More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

机译:更多功能并不总是更好:评估促进型转换类型分类中的概括模型

获取原文
获取外文期刊封面目录资料

摘要

Social media represents a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We re-implemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.
机译:社交媒体代表了有关事件如事件的丰富信息来源。可用信息的纯粹金额使机器学习方法接近进一步处理的必要性。该学习问题通常涉及区域限制数据集,例如只有一个城市的数据。由于推文等社交媒体数据在不同的城市中变化很大,所以高效模型的培训需要从每个兴趣城市标记数据,这是昂贵和耗时的。在这项研究中,我们调查哪个功能最适合培训概括的型号,即在不同数据集中显示出良好性能的模型。除其他新颖的方法外,我们还重新实施了本领域的最受欢迎的特征,并评估了来自十个不同城市的数据。我们表明,许多复杂的特征不一定对训练广义模型非常有价值,并且通过诸如普通字-N克和字符-N克的经典特征而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号