首页> 外文会议>International Conference on Electrical Engineering and Computer Science >Topic Modelling Twitter Data with Latent Dirichlet Allocation Method
【24h】

Topic Modelling Twitter Data with Latent Dirichlet Allocation Method

机译:主题建模推特数据与潜在的dirichlet分配方法

获取原文

摘要

Twitter is a popular social media for every user to issue thoughts and emotional forms which are tweets, tweets that only have 140 characters with limitations to write in text. Twitter is one of the social media places to get information that is always up to date, tweets are categorized into big data because tweets are information that can be used as a source of data for research. Latent Dirichlet Allocation (LDA) as an algorithm that can process large text data (big data). In this study using the LDA method as an algorithm to produce topic modeling, each topic similarity, and visualization of topic clusters from the tweet data generated as many as 4 topics (Economic, Military, Sports, Technology) in Indonesian, where each topic has a number different tweets. The LDA method used in the processing of tweet data is successfully carried out and works optimally, in each topic extraction, topic modeling, generating index words that are in each topic cluster and computer visualization in the topic.LDA output shows optimal performance in the process of word indexing in Sport topics with 1260 tweets with an accuracy of 98% better than the LSI method in Topic Modeling.
机译:Twitter是一个受欢迎的社交媒体,为每个用户发出思想和情感形式,这是鸣叫,推文只有140个字符,indations在文本中写入。 Twitter是获取始终是最新信息的社交媒体场所之一,推文分为大数据,因为推文是可以用作研究数据来源的信息。潜在的Dirichlet分配(LDA)作为可以处理大文本数据(大数据)的算法。在本研究中,使用LDA方法作为产生主题建模的算法,每个主题相似性和主题集群的可视化从印度尼西亚的4个主题(经济,军事,体育,技术)产生的推文数据,每个主题都有一个不同的推文。用于处理Tweet数据处理的LDA方法是成功执行的,在每个主题提取,主题建模中,在每个主题集群中生成索引字和主题中的计算机可视化显示该过程中的最佳性能在体育主题中的单词索引,1260推文,精度比主题建模的LSI方法更好地提高98%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号