Using word embeddings in Twitter election classification

Xiao Yang; Craig Macdonald; Iadh Ounis

首页> 外文期刊>Information retrieval >Using word embeddings in Twitter election classification

【24h】

Using word embeddings in Twitter election classification

机译：在Twitter选举分类中使用单词嵌入

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, as well as the parameters of the word embedding training process, namely the context window size, the dimensionality and the number of negative samples, on the attained classification performance. By comparing the classification results of word embedding models that have been trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data should align with the Twitter classification dataset both in data type and time period to achieve significantly better performance compared to baselines such as SVM with TF-IDF. Moreover, by evaluating the results of word embedding models trained using various context window sizes and dimensionalities, we find that large context window and dimension sizes are preferable to improve the performance. However, the number of negative samples parameter does not significantly affect the performance of the CNN classifiers. Our experimental results also show that choosing the correct word embedding model for use with CNN leads to statistically significant improvements over various baselines such as random, SVM with TF-IDF and SVM with word embeddings. Finally, for out-of-vocabulary ( OOV ) words that are not available in the learned word embedding models, we show that a simple OOV strategy to randomly initialise the OOV words without any prior knowledge is sufficient to attain a good classification performance among the current OOV strategies (e.g. a random initialisation using statistics of the pre-trained word embedding models).

机译：词嵌入和卷积神经网络（CNN）在Twitter的各种分类任务中引起了广泛的关注，例如情绪分类。然而，在现有文献中尚未研究用于生成单词嵌入的配置对分类性能的影响。在本文中，使用旨在检测与选举相关的推文的Twitter选举分类任务，我们调查了用于训练嵌入模型的背景数据集的影响以及词嵌入训练过程的参数（即上下文窗口）阴性样本的大小，尺寸和数量，取决于获得的分类性能。通过比较已使用不同背景语料库（例如Wikipedia文章和Twitter微博）训练的词嵌入模型的分类结果，我们显示背景数据应在数据类型和时间段上与Twitter分类数据集保持一致，以实现更好的效果性能与采用TF-IDF的SVM等基准相比。此外，通过评估使用各种上下文窗口大小和维度训练的词嵌入模型的结果，我们发现较大的上下文窗口和维度大小对于提高性能是更可取的。但是，负样本数参数不会显着影响CNN分类器的性能。我们的实验结果还表明，选择适用于CNN的正确词嵌入模型会在各种基线（如随机，带有TF-IDF的SVM和带有词嵌入的SVM）上产生统计上显着的改进。最后，对于在学习的词嵌入模型中不可用的词汇外（OOV）词，我们表明，在没有任何先验知识的情况下随机初始化OOV词的简单OOV策略足以在其中获得良好的分类性能。当前的OOV策略（例如使用预先训练的词嵌入模型的统计信息进行随机初始化）。

著录项

来源
《Information retrieval》 |2018年第3期|183-207|共25页
作者
Xiao Yang; Craig Macdonald; Iadh Ounis;
展开▼
作者单位

School of Computing Science, University of Glasgow;

School of Computing Science, University of Glasgow;

School of Computing Science, University of Glasgow;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Word embedding; CNN; Twitter; Election classification; Word2vec;

机译：词嵌入;CNN;Twitter;选举分类;Word2vec;

相似文献

外文文献
中文文献
专利

1. A topic-enhanced word embedding for Twitter sentiment classification [J] . Ren Yafeng, Wang Ruimin, Ji Donghong Information Sciences: An International Journal . 2016,第Null期

机译：用于Twitter情感分类的主题增强词嵌入
2. Twitter Data for Predicting Election Results: Insights from Emotion Classification [J] . Srinivasan Satish M., Sangwan Raghvinder S., Neill Colin J., IEEE Technology and Society Magazine . 2019,第1期

机译：Twitter数据可预测选举结果：情感分类的见解
3. Automatic personality prediction from Indonesian user on twitter using word embedding and neural networks [J] . Nicholaus Hendrik Jeremy, Derwin Suhartono Procedia Computer Science . 2021,第1期

机译：使用Word嵌入和神经网络从Twitter上的印度尼西亚用户的自动个性预测
4. Twitter breaking news detector in the 2018 Brazilian presidential election using word embeddings and convolutional neural networks [C] . Kenzo Miranda Sakiyama, Andre Quintiliano Bezerra Silva, Edson Takashi Matsubara International Joint Conference on Neural Networks . 2019

机译：Twitter使用词嵌入和卷积神经网络在2018年巴西总统选举中突破新闻检测器
5. Using Word Embeddings to Explore the Language of Depression on Twitter [D] . Gopchandani, Sandhya. 2019

机译：使用Word Embeddings探索Twitter上的抑郁症语言
6. Using Twitter Data to Monitor Natural Disaster Social Dynamics: A Recurrent Neural Network Approach with Word Embeddings and Kernel Density Estimation [O] . Aldo Hernandez-Suarez, Gabriel Sanchez-Perez, Karina Toscano-Medina, 2019

机译：使用Twitter数据监控自然灾害的社会动态：带有词嵌入和内核密度估计的递归神经网络方法
7. Using Word Embeddings in Twitter Election Classification [O] . Yang, Xiao, Macdonald, Craig, Ounis, Iadh 2017

机译：在Twitter选举分类中使用Word嵌入

Using word embeddings in Twitter election classification

摘要

著录项

相似文献

相关主题

期刊订阅