首页> 外文会议>Signal Processing and Communications Applications Conference >Topic Detection based on Deep Learning Language Model in Turkish Microblogs
【24h】

Topic Detection based on Deep Learning Language Model in Turkish Microblogs

机译:基于土耳其微博的深层学习语言模型的主题检测

获取原文

摘要

Microblogs are short and irregular texts in which people express their opinions in social media. While classification of social media microblog texts according to their topics constitutes a semantic substructure, it helps implementation of various applications. In this study, an analysis comparing conventional bag-of-words and deep-learning based models for the problem of topic detection in microblogs is presented. Turkish tweets containing microblog texts related to current events in Turkey are collected for preparation of the dataset. Tweets in dataset are labeled according to the hashtags they contain. One conventional bag-of-words (TF-IDF based SVM) and two deep learning based models (BERT and BERTurk) are trained on dataset. Performances of the models are measured by using weighted F1 score. TF-IDF based SVM model, BERT and BERTurk perform with F1 scores of 0.807, 0.831 and 0.854 respectively.
机译:微博是短期和不规则的文本,人们在社交媒体中表达了他们的意见。 虽然根据其主题的社交媒体微博文本的分类构成了语义子结构,但它有助于实现各种应用程序。 在这项研究中,提出了一种分析,对微博中的主题检测问题的传统词语和基于深度学习模型进行了分析。 收集包含与土耳其当前事件相关的微博文本的土耳其推文,以便准备数据集。 数据集中的推文根据它们包含的HashTag标记。 一个传统的单词(基于TF-IDF的SVM)和两个基于深度学习的模型(BERT和BERTURK)在数据集上培训。 通过使用加权F1分数测量模型的性能。 基于TF-IDF的SVM模型,BERT和BERTURK分别使用0.807,0.831和0.854的F1分别进行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号