首页> 外文会议>International Conference on Computer Engineering and Systems >A Word Embedding Model Learned from Political Tweets
【24h】

A Word Embedding Model Learned from Political Tweets

机译:嵌入模型从政治推文中学到的一词

获取原文

摘要

Distributed word representations have recently contributed to significant improvements in many natural language processing (NLP) tasks. Distributional semantics have become amongst the important trends in machine learning (ML) applications. Word embeddings are distributed representations of words that learn semantic relationships from a large corpus of text. In the social context, the distributed representation of a word is likely to be different from general text word embeddings. This is relatively due to the unique lexical semantic features and morphological structure of social media text such as tweets, which implies different word vector representations. In this paper, we collect and present a political social dataset that consists of over four million English tweets. An artificial neural network (NN) is trained to learn word co-occurrence and generate word vectors from the political corpus of tweets. The model is 136MB and includes word representations for a vocabulary of over 86K unique words and phrases. The learned model shall contribute to the success of many ML and NLP applications in microblogging Social Network Analysis (OSN), such as semantic similarity and cluster analysis tasks.
机译:分布式字表示最近促进了许多自然语言处理(NLP)任务中的显着改进。分布语义已成为机器学习(ML)应用的重要趋势之一。 Word Embeddings是从大型文本语料库中学习语义关系的单词的分布式表示。在社交背景下,单词的分布式表示可能与常规文本单词嵌入不同。这相对较为归因于由于许多社交媒体文本的独特词汇语义特征和交换,这意味着不同的单词矢量表示。在本文中,我们收集并提出了一个由超过400万英文推文组成的政治社交数据集。培训人工神经网络(NN),以学习单词共同发生并从推文的政治语料库中生成单词向量。该模型是136MB,包括用于超过86K独特单词和短语的词汇的字表示。学习模式应在微博社交网络分析(OSN)中有助于许多ML和NLP应用程序的成功,例如语义相似性和群集分析任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号