首页> 外文会议>Workshop on Automated Event Extraction of Socio-political Events from News >TF-IDF Character iV-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary study

【24h】

TF-IDF Character iV-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary study

机译：TF-IDF字符iV-grams与基于单词嵌入的模型进行细粒度事件分类的初步研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automating the detection of event mentions in online texts and their classification vis-a-vis domain-specific event type taxonomies has been acknowledged by many organisations worldwide to be of paramount importance in order to facilitate the process of intelligence gathering. This paper reports on some preliminary experiments of comparing various linguistically-lightweight approaches for fine-grained event classification based on short text snippets reporting on events. In particular, we compare the performance of a TF-IDF-weighted character n-gram SVM-based model with SVMs trained on various off-the-shelf pre-trained word embeddings (GLOVE, BERT , FASTTEXT) as features. We exploit a relatively large event corpus consisting of circa 610K short text event descriptions classified using 25-event categories that cover political violence and protest events. The best results, i.e., 83.5% macro and 92.4% micro F_1 score, were obtained using the TF-IDF-weighted character n-gram model.

机译：自动检测在线文本中的事件提及及其相对于特定领域的事件类型分类法的分类已被全世界许多组织确认为最重要的，以促进情报收集的过程。本文报告了一些初步实验，这些实验比较了基于事件报告的短文本片段对各种语言轻量级方法进行细粒度事件分类的方法。特别是，我们将TF-IDF加权字符基于N-gram SVM的模型与经过各种现成的预训练词嵌入（GLOVE，BERT，FASTTEXT）作为特征的SVM的性能进行了比较。我们利用一个相对较大的事件语料库，该语料库由大约610K短文本事件描述组成，使用25个事件类别进行分类，涵盖政治暴力和抗议事件。使用TF-IDF加权字符n-gram模型可获得最佳结果，即83.5％的宏和92.4％的微F_1得分。

著录项

来源
《Workshop on Automated Event Extraction of Socio-political Events from News 》|2020年|25-34|共10页
会议地点
作者
Jakub Piskorski; Guillaume Jacquet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
event classification; machine learning; word embeddings; subword models;

机译：事件分类;机器学习词嵌入子词模型;

相似文献

外文文献
中文文献
专利

1. KEYWORD SPOTTING FROM ONLINE CHINESE HANDWRITTEN DOCUMENTS USING ONE-VERSUS-ALL CHARACTER CLASSIFICATION MODEL [J] . HENG ZHANG, DA-HAN WANG, CHENG-LIN LIU, International Journal of Pattern Recognition and Artificial Intelligence . 2013 ,第3期

机译：使用一对多特征分类模型从中文手写文档中发现关键词
2. Improving the Bag-of-Words model with Spatial Pyramid matching using data augmentation for fine-grained arbitrary-oriented ship classification [J] . Viet Hung Luu, Van Kiet Dinh, Nguyen Hoang Hoa Luong, Remote sensing letters . 2019 ,第7a9期

机译：使用数据金字塔对空间定向的金字塔进行改进，以实现细粒度的任意方向的船舶分类
3. Improving the Bag-of-Words model with Spatial Pyramid matching using data augmentation for fine-grained arbitrary-oriented ship classification [J] . Viet Hung Luu, Van Kiet Dinh, Nguyen Hoang Hoa Luong, Remote sensing letters . 2019 ,第7a9期

机译：利用空间金字塔匹配使用数据增强进行精细化任意船舶分类的空间金字塔匹配改进袋式模型
4. MODELING CHARACTERS VERSUS WORDS FOR MANDARIN SPEECH RECOGNITION [C] . Jun Luo, Lori Lamel, Jean-Luc Gauvain IEEE International Conference on Acoustics, Speech, and Signal Processing . 2009

机译：普通话语音识别的建模字符与单词
5. Fine-Grained Video Classification for Rare Events [D] . Shan, Junjie. 2018

机译：稀有事件的精细分类视频
6. Question classification based on Bloom’s taxonomy cognitive domain using modified TF-IDF and word2vec [O] . Manal Mohammed, Nazlia Omar 2020

机译：基于Bloom的分类学认知域使用修改的TF-IDF和Word2VEC的问题分类
7. Question classification based on Bloom’s taxonomy cognitive domain using modified TF-IDF and word2vec [O] . Manal Mohammed, Nazlia Omar 2020

机译：基于Bloom的分类学认知域使用修改的TF-IDF和Word2VEC的问题分类

TF-IDF Character iV-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary study

摘要

著录项

相似文献

相关主题

期刊订阅