首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Tweet2Vec: Character-Based Distributed Representations for Social Media
【24h】

Tweet2Vec: Character-Based Distributed Representations for Social Media

机译:Tweet2VEC:社交媒体的基于角色的分布式表示

获取原文

摘要

Text from social media provides a set of challenges that can cause traditional NLP approaches to fail. Informal language, spelling errors, abbreviations, and special characters are all commonplace in these posts, leading to a prohibitively large vocabulary size for word-level approaches. We propose a character composition model, tweet2vec, which finds vector-space representations of whole tweets by learning complex, non-local dependencies in character sequences. The proposed model outperforms a word-level baseline at predicting user-annotated hashtags associated with the posts, doing significantly better when the input contains many out-of-vocabulary words or unusual character sequences. Our tweet2vec encoder is publicly available.
机译:社交媒体的文本提供了一系列挑战,可能导致传统的NLP方法失败。非正式语言,拼写错误,缩写和特殊字符在这些帖子中都是司空见惯的,导致Word级方法的过大的词汇量。我们提出了一个字符组成模型,Tweet2VEC,它通过在字符序列中学习复杂,非本地依赖项来找到整个推文的矢量空间表示。该建议的模型优于预测与帖子相关联的用户注释的HASHTAG的单词级基线,当输入包含许多词汇单词或异常字符序列时明显更好。我们的Tweet2vec编码器是公开的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号