A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks

Aparup Khatua; Apalak Khatua; Erik Cambria

首页> 外文期刊>Information Processing & Management >A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks

【24h】

A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks

机译：关于两个流行病的故事：上下文Word2Vec，用于在爆发期间对Twitter流进行分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Unstructured tweet feeds are becoming the source of real-time information for various events. However, extracting actionable information in real-time from this unstructured text data is a challenging task. Hence, researchers are employing word embedding approach to classify unstructured text data. We set our study in the contexts of the 2014 Ebola and 2016 Zika outbreaks and probed the accuracy of domain-specific word vectors for identifying crisis-related actionable tweets. Our findings suggest that relatively smaller domain-specific input corpora from the Twitter corpus are better in extracting meaningful semantic relationship than generic pre-trained Word2Vec (contrived from Google News) or GloVe (of Stanford NLP group). However, domain-specific quality tweet corpora during the early stages of outbreaks are normally scant, and identifying actionable tweets during early stages is crucial to stemming the proliferation of an outbreak. To overcome this challenge, we consider scholarly abstracts, related to Ebola and Zika virus, from PubMed and probe the efficiency of cross-domain resource utilization for word vector generation. Our findings demonstrate that the relevance of PubMed abstracts for the training purpose when Twitter data (as input corpus) would be scant during the early stages of the outbreak. Thus, this approach can be implemented to handle future outbreaks in real time. We also explore the accuracy of our word vectors for various model architectures and hyper-parameter settings. We observe that Skip-gram accuracies are better than CBOW, and higher dimensions yield better accuracy.

机译：非结构化的Twitter提要正成为各种事件的实时信息源。但是，从这种非结构化文本数据中实时提取可操作的信息是一项艰巨的任务。因此，研究人员正在采用词嵌入方法对非结构化文本数据进行分类。我们在2014年埃博拉病毒和2016年寨卡病毒爆发的背景下进行了研究，并探讨了特定领域单词向量在识别与危机相关的可行推文中的准确性。我们的发现表明，相对于一般的预训练Word2Vec（来自Google News）或GloVe（来自Stanford NLP小组），Twitter语料库中相对较小的特定于域的输入语料库在提取有意义的语义关系方面更好。但是，通常不会在爆发的早期阶段使用针对特定领域的优质推文语料库，而在早期阶段识别可行的推文对于阻止爆发的爆发至关重要。为了克服这一挑战，我们考虑了来自PubMed的与埃博拉病毒和寨卡病毒有关的学术摘要，并探讨了跨域资源利用在词向量生成中的效率。我们的研究结果表明，当暴发的早期阶段Twitter数据（作为输入语料库）很少时，PubMed摘要与培训目的的相关性。因此，可以实施此方法来实时处理将来的爆发。我们还探讨了针对各种模型架构和超参数设置的词向量的准确性。我们观察到，Skip-gram精度优于CBOW，并且更高的尺寸会产生更好的精度。

著录项

来源
《Information Processing & Management》 |2019年第1期|247-257|共11页
作者
Aparup Khatua; Apalak Khatua; Erik Cambria;
展开▼
作者单位

Department of Computer Science and Engineering, University of Calcutta;

XLRI – Xavier School of Management;

School of Computer Science and Engineering, Nanyang Technological University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Epidemics; Ebola; Zika; PubMed; Twitter; Text classification; Word Vectors;

机译：流行病;埃博拉病毒;Zika;PubMed;Twitter;文本分类;词向量;
入库时间 2022-08-18 04:10:59

相似文献

外文文献
中文文献
专利

1. Contextualizing Nonprofits' Use of Links on Twitter During the West African Ebola Virus Epidemic [J] . Melissa Tully, Kajsa E. Dalrymple, Rachel Young Communication studies . 2019,第3期

机译：西非埃博拉病毒流行期间非营利组织对Twitter上链接的使用的情境化
2. Contextualizing Nonprofits' Use of Links on Twitter During the West African Ebola Virus Epidemic [J] . Melissa Tully, Kajsa E. Dalrymple, Rachel Young Communication studies . 2019,第3期

机译：在西非埃博拉病毒流行病中，上下文中的非营利组织在Twitter上使用链接
3. Classifying streaming of Twitter data based on sentiment analysis using hybridization [J] . Nagarajan Senthil Murugan, Gandhi Usha Devi Neural computing & applications . 2019,第5期

机译：基于杂交的情感分析对推特数据进行分类
4. Epidemic Outbreak and Spread Detection System Based on Twitter Data [C] . Xiang Ji, Soon Ae Chun, James Geller Health information science. . 2012

机译：基于Twitter数据的疫情爆发和传播检测系统
5. Pandemics in the Age of Twitter: A Content Analysis of the 2009 H1N1 Outbreak [D] . Chew, Cynthia Mei. 2010

机译：Twitter时代的大流行：2009年H1N1爆发的内容分析
6. Word2Vec inversion and traditional text classifiers for phenotyping lupus [O] . Clayton A. Turner, Alexander D. Jacobs, Cassios K. Marques, 2017

机译：Word2Vec反转和用于狼疮表型的传统文本分类器
7. New Approach to Contextual Suggestions Based on Word2Vec. [R] . Chen, Y., Tang, Z., Zhao, X., 2014

机译：基于Word2Vec的语境建议新方法。

A tale of two epidemics: Contextual Word2Vec for classifying twitter streams during outbreaks

摘要

著录项

相似文献

相关主题

期刊订阅