More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

机译：更多功能并不总是更好：评估促进型转换类型分类中的概括模型

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Social media represents a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We re-implemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.

机译：社交媒体代表了有关事件如事件的丰富信息来源。可用信息的纯粹金额使机器学习方法接近进一步处理的必要性。该学习问题通常涉及区域限制数据集，例如只有一个城市的数据。由于推文等社交媒体数据在不同的城市中变化很大，所以高效模型的培训需要从每个兴趣城市标记数据，这是昂贵和耗时的。在这项研究中，我们调查哪个功能最适合培训概括的型号，即在不同数据集中显示出良好性能的模型。除其他新颖的方法外，我们还重新实施了本领域的最受欢迎的特征，并评估了来自十个不同城市的数据。我们表明，许多复杂的特征不一定对训练广义模型非常有价值，并且通过诸如普通字-N克和字符-N克的经典特征而言。

著录项

来源
《Conference on empirical methods in natural language processing》|2015年||共10页
会议地点
作者
Axel Schulz; Christian Guckelsberger; Benedikt Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Semantic Abstraction for Generalization of Tweet Classification: An Evaluation on Incident-Related Tweets [J] . Schulz Axel, Guckelsberger Christian, Janssen Frederik Semantic web . 2017,第3期

机译：推文分类概括的语义抽象：对与事件相关的推文的评估
2. Evaluation of mathematical models for QRS feature extraction and QRS morphology classification in ECG signals [J] . Measurement . 2020,第期

机译：ECG信号中QRS特征提取的数学模型和QRS形态分类的评价
3. The Dry Revolution: Evaluation of Three Different EEG Dry Electrode Types in Terms of Signal Spectral Features, Mental States Classification and Usability [J] . Di Flumeri Gianluca, Arico Pietro, Borghini Gianluca, Nature reviews Cancer . 2019,第6期

机译：干燥革命：在信号光谱特征，精神状态分类和可用性方面评估三种不同的EEG干电极类型
4. More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets [C] . Axel Schulz, Christian Guckelsberger, Benedikt Schmidt Conference on empirical methods in natural language processing . 2015

机译：更多功能并不总是更好：在推文的事件类型分类中评估泛化模型
5. Pixel-Based Classification of Land Cover and Land Use Incorporating External Modeling Products, Sampling Designs, and Multi-Type Features. [D] . Jin, Huiran. 2013

机译：基于像素的土地覆被和土地利用分类，包括外部建模产品，抽样设计和多种类型的要素。
6. The Dry Revolution: Evaluation of Three Different EEG Dry Electrode Types in Terms of Signal Spectral Features Mental States Classification and Usability [O] . Gianluca Di Flumeri, Pietro Aricò, Gianluca Borghini, 2019

机译：干革命：根据信号频谱特征心理状态分类和可用性评估三种不同的脑电干电极类型
7. Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes [O] . Mónica López-Ratón, Mar Rodríguez-Girondo, María Xosé Rodríguez-Álvarez, 2015

机译：广义添加剂模型在分类目的评价中的应用
8. Case Study for New Feature Extraction Algorithms, Automated Data Classification, and Model-Assisted Probability of Detection Evaluation (Preprint) [R] . Aldrin, J. C. , Knopp, J. S. 2006

机译：新特征提取算法，自动数据分类和模型辅助检测评估概率的案例研究（预印本）

More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅