More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

机译：更多功能并不总是更好：在推文的事件类型分类中评估泛化模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Social media represents a rich source of up-to-date information about events such as incidents. The sheer amount of available information makes machine learning approaches a necessity for further processing. This learning problem is often concerned with regionally restricted datasets such as data from only one city. Because social media data such as tweets varies considerably across different cities, the training of efficient models requires labeling data from each city of interest, which is costly and time consuming. In this study, we investigate which features are most suitable for training generalizable models, i.e., models that show good performance across different datasets. We re-implemented the most popular features from the state of the art in addition to other novel approaches, and evaluated them on data from ten different cities. We show that many sophisticated features are not necessarily valuable for training a generalized model and are outperformed by classic features such as plain word-n-grams and character-n-grams.

机译：社交媒体代表了有关事件（例如事件）的最新信息的丰富来源。大量可用信息使机器学习成为进行进一步处理的必要条件。这个学习问题通常与区域限制的数据集有关，例如仅来自一个城市的数据。由于诸如推文之类的社交媒体数据在不同城市之间存在很大差异，因此有效模型的训练需要标记来自每个感兴趣城市的数据，这既昂贵又费时。在这项研究中，我们调查哪些功能最适合训练通用模型，即在不同数据集上表现出良好性能的模型。除了其他新颖的方法外，我们还重新实现了现有技术中最受欢迎的功能，并根据来自十个不同城市的数据对它们进行了评估。我们表明，许多复杂的功能对于训练通用模型不一定有价值，而经典功能（例如普通单词n-gram和字符n-gram）的性能却不如后者。

著录项

来源
《Conference on empirical methods in natural language processing》|2015年|421-430|共10页
会议地点
作者
Axel Schulz; Christian Guckelsberger; Benedikt Schmidt;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Semantic Abstraction for Generalization of Tweet Classification: An Evaluation on Incident-Related Tweets [J] . Schulz Axel, Guckelsberger Christian, Janssen Frederik Semantic web . 2017,第3期

机译：推文分类概括的语义抽象：对与事件相关的推文的评估
2. Evaluation of mathematical models for QRS feature extraction and QRS morphology classification in ECG signals [J] . Measurement . 2020,第期

机译：ECG信号中QRS特征提取的数学模型和QRS形态分类的评价
3. The Dry Revolution: Evaluation of Three Different EEG Dry Electrode Types in Terms of Signal Spectral Features, Mental States Classification and Usability [J] . Di Flumeri Gianluca, Arico Pietro, Borghini Gianluca, Nature reviews Cancer . 2019,第6期

机译：干燥革命：在信号光谱特征，精神状态分类和可用性方面评估三种不同的EEG干电极类型
4. More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets [C] . Axel Schulz, Christian Guckelsberger, Benedikt Schmidt Conference on empirical methods in natural language processing . 2015

机译：更多功能并不总是更好：评估促进型转换类型分类中的概括模型
5. Pixel-Based Classification of Land Cover and Land Use Incorporating External Modeling Products, Sampling Designs, and Multi-Type Features. [D] . Jin, Huiran. 2013

机译：基于像素的土地覆被和土地利用分类，包括外部建模产品，抽样设计和多种类型的要素。
6. The Dry Revolution: Evaluation of Three Different EEG Dry Electrode Types in Terms of Signal Spectral Features Mental States Classification and Usability [O] . Gianluca Di Flumeri, Pietro Aricò, Gianluca Borghini, 2019

机译：干革命：根据信号频谱特征心理状态分类和可用性评估三种不同的脑电干电极类型
7. Application of Generalized Additive Models to the Evaluation of Continuous Markers for Classification Purposes [O] . Mónica López-Ratón, Mar Rodríguez-Girondo, María Xosé Rodríguez-Álvarez, 2015

机译：广义添加剂模型在分类目的评价中的应用
8. Case Study for New Feature Extraction Algorithms, Automated Data Classification, and Model-Assisted Probability of Detection Evaluation (Preprint) [R] . Aldrin, J. C. , Knopp, J. S. 2006

机译：新特征提取算法，自动数据分类和模型辅助检测评估概率的案例研究（预印本）

More Features Are Not Always Better: Evaluating Generalizing Models in Incident Type Classification of Tweets

摘要

著录项

相似文献

相关主题

期刊订阅