A #hashtagtokenizer for Social Media Messages

VLADIA PINHEIRO; RAFAEL PONTES; VASCO FURTADO

首页> 外文期刊>International journal of computational linguistics and applications >A #hashtagtokenizer for Social Media Messages

【24h】

A #hashtagtokenizer for Social Media Messages

机译：#hashtagtokenizer用于社交媒体消息

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In social media, mainly due to length constraints, users write succinct messages and use hashtags to refer to entities, events, sentiments or ideas. Hashtags carry a lot of content that can help in many tasks and applications involving text processing such as sentiment analysis, named entity recognition and information extraction. However, identifying the individual words of a hashtag is not trivial because the traditional POS taggers typically consider it as a single token, despite the fact that it might contain multiple words, e.g. #fergusondecisioa #imcharliehebdo. In this work, we propose a generic model for hashtagtokenisation that aims to split up one hashtag into several tokens corresponding to each individual word contained in it (e.g. "#imcharliehebdo " would become four tokens, "#", "i", "am" and "Charlie Hebdo"). Our hashtagtokenizer is based on a machine learning segmentation method for Chinese language and makes also use of Wikipedia as encyclopedic knowledge base. We have evaluated the inference power of our approach by comparing the tokens produced by our approach to those produced by human taggers. The results demonstrated the good accuracy and applicability of the proposed model for general-purpose applications.

机译：在社交媒体中，主要由于篇幅所限，用户编写简洁的消息并使用标签来引用实体，事件，情感或想法。主题标签包含许多内容，可以在涉及文本处理的许多任务和应用程序中提供帮助，例如情感分析，命名实体识别和信息提取。但是，识别主题标签的各个单词并非易事，因为传统的POS标记程序通常将其视为单个令牌，尽管事实上它可能包含多个单词，例如#fergusondecisioa #imcharliehebdo。在这项工作中，我们提出了一种用于hashtagtokenization的通用模型，该模型旨在将一个hashtag分成对应于其中包含的每个单词的几个标记（例如，“＃imcharliehebdo”将变为四个标记，“＃”，“ i”，“ am” ”和“查理周刊”）。我们的hashtagtokenizer基于中文的机器学习分割方法，并且还利用Wikipedia作为百科全书知识库。我们通过将我们的方法产生的令牌与人类标记者产生的令牌进行比较，评估了我们方法的推理能力。结果证明了该模型在通用应用中的良好准确性和适用性。

著录项

来源
《International journal of computational linguistics and applications》 |2015年第2期|141-158|共18页
作者
VLADIA PINHEIRO; RAFAEL PONTES; VASCO FURTADO;
展开▼
作者单位

Programa de Pos-Graduacaoem Informatica Aplicada - Universidade de Fortaleza Av. Washington Soares, 1321, Fortaleza, Ceara, Brasil;

Programa de Pos-Graduacaoem Informatica Aplicada - Universidade de Fortaleza Av. Washington Soares, 1321, Fortaleza, Ceara, Brasil;

Programa de Pos-Graduacaoem Informatica Aplicada - Universidade de Fortaleza Av. Washington Soares, 1321, Fortaleza, Ceara, Brasil;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Social media; information extraction; tokenization; hashtag;

机译：社交媒体;信息提取;标记化;井号;

相似文献

外文文献
中文文献
专利

1. Media or message, which is the king in social commerce?: An empirical study of participants' intention to repost marketing messages on social media [J] . Wang Wei, Chen Renee Rui, Ou Carol Xiaojuan, Computers in Human Behavior . 2019,第APRa期

机译：媒体或消息，这是社交商务中的王者吗？：对参与者打算在社交媒体上重新发布营销消息的意图的实证研究
2. Communicating corporate social responsibility (CSR) on social media How do message source and types of CSR messages influence stakeholders' perceptions? [J] . Ruoxu Wang, Yan Huang Corporate communications . 2018,第3期

机译：在社交媒体上传达企业社会责任（CSR），消息源和CSR消息的类型如何影响利益相关者的看法？
3. E-WOM messaging on social media Social ties, temporal distance, and message concreteness [J] . Choi Yung Kyun, Seo Yuri, Yoon Sukki Internet Research: Electronic Networking Applications and Policy . 2017,第3期

机译：社交媒体社交领带，时间距离和消息具体的e-wom消息
4. How to Promote Patient Safety in Social Media: A Comparison between Messages in Social Media and Newspapers [C] . Na Sun, Pei-Luen Patrick Rau International Conference on Cross-Cultural Design;International Conference on Human-Computer Interaction . 2014

机译：如何促进社交媒体中的患者安全：社交媒体中的消息与报纸之间的比较
5. The Effects of Cause-Related Marketing (CRM) on Social Media and in Health Communication: How Does CRM-Based Social Media Message Influence Health Perception? [D] . Kang, Hannah. 2017

机译：原因相关营销（CRM）对社交媒体和健康传播的影响：基于CRM的社交媒体消息如何影响健康认知？
6. Terrorists and Social Media Messages: A Critical Analysis of Boko Haram’s Messages and Messaging Techniques [O] . Chris Wolumati Ogbondah, Pita Ogaba Agbese -1

机译：恐怖分子和社交媒体消息：对Boko Haram的信息和消息传递技术的关键分析
7. When Social Media Images and Messages Don’t Match: Attention to Text versus Imagery to Effectively Convey Safety Information on Social Media [O] . Elizabeth G. Klein, Kristin Roberts, Jennifer Manganello, 2020

机译：当社交媒体图像和消息不匹配时：注意文本与图像有效地传达了社交媒体的安全信息

A #hashtagtokenizer for Social Media Messages

摘要

著录项

相似文献

相关主题

期刊订阅