首页> 外文会议>IEEE International Conference on Communications Workshops >Evaluation of Hybrid Unsupervised and Supervised Machine Learning Approach to Detect Self-Reporting of COVID-19 Symptoms on Twitter

【24h】

Evaluation of Hybrid Unsupervised and Supervised Machine Learning Approach to Detect Self-Reporting of COVID-19 Symptoms on Twitter

机译：杂交无监督和监督机器学习方法检测Covid-19在Twitter上的自我报告的评价

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With over 127 million cases globally, the COVID-19 pandemic marks a sentinel event in global health. However, true case estimations have been elusive due to lack of testing and diagnostic capacity, asymptomatic cases, and individuals who do not get tested or seek care. Concomitantly, new digital surveillance tools to detect, characterize, and report COVID-19 cases are emerging, including using structured and unstructured data from users self-reporting COVID-19-related experiences on the Internet and social media platforms. In this study, we develop and evaluate a hybrid unsupervised and supervised machine learning approach to detect self-reported COVID-19-related symptoms on Twitter during the early stages of the pandemic. Tweets were collected from the public API stream from March 3^{rd-31^{st 2020, filtered for COVID-19-related terms. We used the biterm topic model to cluster tweets into theme-associated groups for the first 18 days of tweets, which were then extracted and manually annotated to identify users self-reporting suspected COVID-19 symptoms or status. Using this manually annotated data as a training set, we used an XLNet deep learning model for classifying symptom-related tweets from a larger corpus and also evaluated model performance. From 4,492,954 tweets collected, the unsupervised learning process yielded 3,465 (<1%) symptom tweets used to form our ground-truth COVID-19 symptoms dataset (n = 11,550). The XLNet text classifier achieved the highest accuracy (.91) and f1 (.62) compared to baseline models evaluated for classification. After re-training with adjusted loss function, we boosted the classifier’s precision to 0.81 while maintaining a high f1 (0.66), resulting in identification of an additional 2,622 symptom-related tweets when applied to an additional 11 days of tweets collected. Our study used a hybrid machine learning approach to enable high precision identification of Twitter user-generated COVID-19 symptom discussions. The model is a digital epidemiology tool that can identify social media users who self-report symptoms during the early periods of an outbreak.}}

机译：在全球超过12700万个案件，Covid-19 Pandemic在全球健康中标志着一个哨兵活动。然而，由于缺乏测试和诊断能力，无症状案例和未经测试或寻求护理的个人而难以难以忽视。始终如一地，新的数字监控工具是为了检测，表征和报告CoVID-19案件的出现，包括使用来自用户在互联网和社交媒体平台上的用户自我报告的Covid-19相关经验中的结构化和非结构化数据。在这项研究中，我们开发和评估了混合无监督和监督的机器学习方法，以在大流行早期阶段检测Twitter上的自我报告的Covid-19相关症状。从3月3日从公共API流收集推文^{rd -31^{st 2020，过滤Covid-19相关术语。我们使用BITERM主题模型将推文集群关联组关联的组，然后提取并手动注释，以识别自我报告疑似COVID-19症状或地位的用户。使用此手动注释的数据作为培训集，我们使用了XLNet深度学习模型，用于将与较大的语料库中的症状相关的推文进行分类，并评估模型性能。从收集的4,492,954次推文中，无监督的学习过程产生3,465（<1％）症状推文，用于形成我们的地面真理Covid-19症状数据集（n = 11,550）。与对分类评估的基线模型相比，XLNET文本分类器实现了最高精度（.91）和F1（.62）。通过调整损耗函数重新训练后，我们将分类器的精确度提升至0.81，同时保持高F1（0.66），导致识别额外的2,622个与症状相关的推文，当申请收集的额外11天。我们的研究使用混合机器学习方法来实现Twitter用户生成的Covid-19症状讨论的高精度识别。该模型是一种数字流行病学工具，可以识别在疫情的早期自我报告症状的社交媒体用户。}}

著录项

来源
《IEEE International Conference on Communications Workshops 》|2021年|1-6|共6页
会议地点
作者
Mingxiang Cai; Jiawei Li; Matthew Nali; Tim K. Mackey;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
COVID-19; Training; Social networking (online); Pandemics; Conferences; Blogs; Text categorization;

机译：Covid-19;培训;社交网络（在线）;流行病;会议;博客;文本分类;

相似文献

外文文献
中文文献
专利

1. A Hybrid Supervised/Unsupervised Machine Learning Approach to Solar Flare Prediction [J] . Federico Benvenuto, Michele Piana, Cristina Campi, The Astrophysical journal . 2018 ,第1期

机译：太阳耀斑预测的混合有监督/无监督机器学习方法
2. MDFP: A MACHINE LEARNING MODEL FOR DETECTING FAKE FACEBOOK PROFILES USING SUPERVISED AND UNSUPERVISED MINING TECHNIQUES [J] . Mohammed Basil Albayati, Ahmad Mousa Altamimi International journal of simulation: systems, science and technology . 2019 ,第1aaPagea1期

机译：MDFP：一种使用监督和未经监督的采矿技术来检测假脸书轮廓的机器学习模型
3. A Novel Automatic Classification System Based on Hybrid Unsupervised and Supervised Machine Learning for Electrospun Nanofibers [J] . Cosimo Ieracitano, Annunziata Paviglianiti, Maurizio Campolo, 自动化学报：英文版 . 2021 ,第001期

机译：一种基于混合无监督和监督机械学习的新型自动分类系统，用于电纺纳米纤维
4. Symptoms based Early Clinical Diagnosis of COVID-19 Cases using Hybrid and Ensemble Machine Learning Techniques [C] . C Koushik, Ritwika Bhattacharjee, C Sweetlin Hemalatha International Conference on Computer, Communication and Signal Processing . 2021

机译：基于症状基于杂交和集合机学习技术的Covid-19患者的早期临床诊断
5. An Evaluation of Unsupervised Machine Learning Algorithms for Detecting Fraud and Abuse in the U.S. Medicare Insurance Program [D] . da Rosa, Raquel C. 2018

机译：美国医疗保险计划中用于检测欺诈和滥用的无监督机器学习算法的评估
6. ClusterTAD: an unsupervised machine learning approach to detecting topologically associated domains of chromosomes from Hi-C data [O] . Oluwatosin Oluwadare, Jianlin Cheng 2017

机译：ClusterTAD：一种无监督的机器学习方法用于从Hi-C数据中检测染色体的拓扑关联域
7. Supervised Distributed Multi-Instance and Unsupervised Single-Instance Autoencoder Machine Learning for Damage Diagnostics with High-Dimensional Data—A Hybrid Approach and Comparison Study [O] . Stefan Bosse, Dennis Weiss, Daniel Schmidt 2021

机译：监督分布式多实例和无监督的单实例自动频率机器学习损坏诊断，具有高维数据 - 一种混合方法和比较研究

Evaluation of Hybrid Unsupervised and Supervised Machine Learning Approach to Detect Self-Reporting of COVID-19 Symptoms on Twitter

摘要

著录项

相似文献

相关主题

期刊订阅