首页> 外文会议>International Research Conference on Smart Computing and Systems Engineering >Keyword extraction from Tweets using NLP tools for collecting relevant news
【24h】

Keyword extraction from Tweets using NLP tools for collecting relevant news

机译:使用NLP工具收集相关新闻的推文关键字提取

获取原文

摘要

Keywords play a major role in representing the gist of a document. Therefore, a lot of Natural Language processing tools have been implemented to identify keywords in both structured and unstructured texts. Text that appears in social media platforms such as twitter is mostly unstructured because of the character limitation. Consequently, a lot of short terms and symbols such as emoticons and URLs are included in tweets. Keyword extraction from grammatically ambiguous text is not easy compared to structured text since it is hard to rely on the linguistic features in unstructured texts. But when it comes to news on twitter, it may contain somewhat structured text than informal text does but it depends on the tweeter, the person who posts the tweet. In this paper, a methodology is proposed to extract keywords from a given tweet to retrieve relevant news that has been posted on twitter, for fake news detection. The intention of extracting keywords is to find more related news efficiently and effectively. For this approach, a corpus that contains tweet texts from different domains is built in order to make this approach more generic instead of making it a domain-specific approach. In fact, the Stanford Core NLP tool kit, Wordnet linguistic database and statistical method are used for extracting keywords from a tweet. For the system evaluation, the Turing test which has human intervention is used. The system was able to acquire an accuracy of 67.6% according to the evaluation conducted.
机译:关键字在代表文档的要点中发挥着重要作用。因此,已经实现了许多自然语言处理工具来识别结构化和非结构化文本中的关键字。由于角色限制,在Twitter等社交媒体平台中出现的文本主要是非结构化的。因此,推文中包含大量短篇小说和诸如表情符号和URL的符号。与结构化文本相比,来自语法模糊文本的关键字提取并不容易,因为它很难依赖非结构化文本中的语言特征。但是,当谈到Twitter上的新闻时,它可能包含稍微结构化的文本,而不是非正式文本,但它取决于推文,发布推文的人。在本文中,提出了一种从给定推文中提取关键字的方法,以检索在Twitter上发布的相关新闻,用于假新闻检测。提取关键词的意图是有效且有效地查找更多相关新闻。对于这种方法,构建了包含来自不同域的推文文本的语音,以便使这种方法更通用,而不是使其成为特定于域的方法。事实上,斯坦福核心NLP工具包,Wordnet语言数据库和统计方法用于从推文中提取关键字。对于系统评估,使用具有人为干预的图灵测试。根据所进行的评估,该系统能够获得67.6%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号