首页> 外文会议>International Conference on Information Communication Technology and System >An evaluation of Twitter river and Logstash performances as elasticsearch inputs for social media analysis of Twitter
【24h】

An evaluation of Twitter river and Logstash performances as elasticsearch inputs for social media analysis of Twitter

机译:对Twitter河和Logstash表现的评估作为Twitter社会媒体分析的弹性搜索输入

获取原文

摘要

Social media analysis of Twitter can be used to show a rating of someone, a service, or a product from Twitter user's perspective. As one of social media with the highest number of users in the world, Twitter provides an API that allows us to observe and take Twitter data in real-time. Elasticsearch is a tool that has the ability to analyze big data. There are two ways to input Twitter data to Elasticsearch. The first one is through Twitter River and the second way is through Logstash. This input factor is important in influencing the output of the system. Accuracy and efficiency of input data and the way of data is stored is really important to support a system of big data. In this paper, an evaluation of Twitter River and Logstash performances as in case of inputting Twitter data from Twitter API is presented. This research monitors Elasticsearch cluster on two HPC servers that crawls data from Twitter API simultaneously. Comparing parameters are CPU process, RAM usage, disk usage, Twitter input data, and amount of input fields. The result of this research shows that the average CPU process per day of Twitter River is 33.96%, and for Logstash 34.95%. The average RAM usage of Twitter River per day is 32.7% while Logstash used 39.9%. Besides, the average disk usage of Twitter River per day is 431 MB and for Logstash 544 MB. For the Twitter input data, Twitter River inputs 191 more tweet than Logstash in a week. And the result shows that Logstash inputting 11 times field more than Twitter River.
机译:Twitter的社交媒体分析可用于从Twitter用户的角度显示某人,某项服务或某项产品的评分。作为世界上拥有最多用户的社交媒体之一,Twitter提供了一个API,使我们可以实时观察和获取Twitter数据。 Elasticsearch是一种能够分析大数据的工具。有两种方法可以将Twitter数据输入到Elasticsearch。第一种是通过Twitter River,第二种是通过Logstash。该输入因子对于影响系统的输出很重要。输入数据的准确性和效率以及数据的存储方式对于支持大数据系统非常重要。在本文中,给出了在从Twitter API输入Twitter数据的情况下对Twitter River和Logstash性能的评估。这项研究监视了两个HPC服务器上的Elasticsearch集群,该服务器同时从Twitter API抓取数据。比较参数是CPU进程,RAM使用率,磁盘使用率,Twitter输入数据和输入字段的数量。研究结果表明,Twitter River的每日平均CPU进程为33.96%,而Logstash的每日平均CPU进程为34.95%。 Twitter River每天的平均RAM使用率为32.7%,而Logstash则为39.9%。此外,Twitter River的平均每天磁盘使用量为431 MB,Logstash则为544 MB。对于Twitter输入数据,Twitter River在一周内输入的推文比Logstash多191。结果表明,Logstash输入的域比Twitter River多11倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号