首页> 外文会议>Workshop on Automated Event Extraction of Socio-political Events from News >TF-IDF Character iV-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary study
【24h】

TF-IDF Character iV-grams versus Word Embedding-based Models for Fine-grained Event Classification: A Preliminary study

机译:TF-IDF字符iV-grams与基于单词嵌入的模型进行细粒度事件分类的初步研究

获取原文

摘要

Automating the detection of event mentions in online texts and their classification vis-a-vis domain-specific event type taxonomies has been acknowledged by many organisations worldwide to be of paramount importance in order to facilitate the process of intelligence gathering. This paper reports on some preliminary experiments of comparing various linguistically-lightweight approaches for fine-grained event classification based on short text snippets reporting on events. In particular, we compare the performance of a TF-IDF-weighted character n-gram SVM-based model with SVMs trained on various off-the-shelf pre-trained word embeddings (GLOVE, BERT , FASTTEXT) as features. We exploit a relatively large event corpus consisting of circa 610K short text event descriptions classified using 25-event categories that cover political violence and protest events. The best results, i.e., 83.5% macro and 92.4% micro F_1 score, were obtained using the TF-IDF-weighted character n-gram model.
机译:自动检测在线文本中的事件提及及其相对于特定领域的事件类型分类法的分类已被全世界许多组织确认为最重要的,以促进情报收集的过程。本文报告了一些初步实验,这些实验比较了基于事件报告的短文本片段对各种语言轻量级方法进行细粒度事件分类的方法。特别是,我们将TF-IDF加权字符基于N-gram SVM的模型与经过各种现成的预训练词嵌入(GLOVE,BERT,FASTTEXT)作为特征的SVM的性能进行了比较。我们利用一个相对较大的事件语料库,该语料库由大约610K短文本事件描述组成,使用25个事件类别进行分类,涵盖政治暴力和抗议事件。使用TF-IDF加权字符n-gram模型可获得最佳结果,即83.5%的宏和92.4%的微F_1得分。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号