首页> 外文会议>International Conference on eDemocracy eGovernment >Using Reddit Data for Multi-Label Text Classification of Twitter Users Interests

【24h】

Using Reddit Data for Multi-Label Text Classification of Twitter Users Interests

机译：使用Reddit数据对Twitter用户兴趣进行多标签文本分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The automation process for inferring users' interest groups is a challenge task in social networks research and it has applications in marketing and recommendation systems. Manually labeling of documents is a difficult and an expensive task, but it is essential for training an automatic text classifier. Actually, there are several approaches where the problem is treated as a multi-label prediction task. In this work, a methodology is proposed to automatically categorize data by considering Reddit and Twitter data. First, a dataset of 42.100 publications belongs to popular forums site Reddit is collected to train a model with labeled data. Then, a dataset of tweets, an average of 100 tweets per user, from 1573 profiles is collected to predict users' topics of interest with the trained model. Finally, we were able to automatically categorize data with an average precision of 75.62%.

机译：推断用户兴趣组的自动化过程是社交网络研究中的一项艰巨任务，并且已在营销和推荐系统中得到应用。手动标记文档是一项艰巨且昂贵的任务，但是对于训练自动文本分类器而言，这是必不可少的。实际上，有几种方法可将问题视为多标签预测任务。在这项工作中，提出了一种通过考虑Reddit和Twitter数据对数据进行自动分类的方法。首先，收集了属于热门论坛站点Reddit的42.100种出版物的数据集，以训练带有标记数据的模型。然后，收集来自1573个配置文件的推文数据集（每个用户平均100条推文），以使用训练后的模型预测用户感兴趣的主题。最后，我们能够以75.62％的平均精度对数据进行自动分类。

著录项

来源
《International Conference on eDemocracy eGovernment 》|2019年|324-327|共4页
会议地点
作者
Angel Fiallos; Karina Jimenes;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Sports; Task analysis; Data models; Twitter;

机译：培训;体育;任务分析;数据模型; Twitter;

相似文献

外文文献
中文文献
专利

1. Predicting Age Groups of Reddit Users Based on Posting Behavior and Metadata: Classification Model Development and Validation [J] . Robert Chew, Caroline Kery, Laura Baum, JMIR public health and surveillance. . 2021 ,第3期

机译：根据发布行为和元数据预测Reddit用户的年龄组：分类模型开发和验证
2. Multi-label dataless text classification with topic modeling [J] . Zha Daochen, Li Chenliang Knowledge and information systems . 2019 ,第1期

机译：具有主题建模的多标签DataLess文本分类
3. Modified TF-Assoc Term Weighting Method for Text Classification on News Dataset from Twitter [J] . Imroatul Khuluqi Izzah, Abba Suganda Girsang IAENG Internaitonal journal of computer science . 2021 ,第1Pta2期

机译：Twitter新闻数据集文本分类的修改后的TF-assoce术语加权方法
4. Using Reddit Data for Multi-Label Text Classification of Twitter Users Interests [C] . Angel Fiallos, Karina Jimenes International Conference on eDemocracy amp;amp;amp;amp;amp;amp; eGovernment . 2019

机译：使用Reddit数据进行多标签文本分类的Twitter用户兴趣
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task [O] . Abeed Sarker, Maksim Belousov, Jasper Friedrichs, 2018

机译：Twitter上与药物有关的文本分类和概念归一化的数据和系统：来自社交媒体健康促进会（SMM4H）-2017的共享任务的见解
7. Using Text Classification to Estimate the Depression Level of Reddit Users [O] . Sergio Gastón Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez 2021

机译：使用文本分类来估计Reddit用户的抑郁级别

Using Reddit Data for Multi-Label Text Classification of Twitter Users Interests

摘要

著录项

相似文献

相关主题

期刊订阅