首页> 中文期刊>广东工业大学学报 >一种基于Storm的在线产品评论信息采集的方法

一种基于Storm的在线产品评论信息采集的方法

     

摘要

针对如何尽早地获取电商网站中产品的评论信息,进而实时地把握产品舆情,提出了一种基于Storm的在线产品评论信息采集方法.该方法将流计算的概念应用于网络爬虫中,并通过SHHD算法对采集周期进行动态调整.实验结果表明:基于Storm平台进行信息采集具有吞吐量大、可扩展性强等优点;SHHD算法可以有效地降低采集系统对网络带宽和系统资源的消耗,实现了适应性的增量的在线产品评论信息采集过程;SHHD在产品的评论信息获取的滞后时间上较Poisson、SART等方法具有明显的优势.%With regard to getting comment information of the products in the electricity sales website as soon as possible and grasping product public opinion in real time, a method of online product reviews information collection based on Storm is presented. The concept of flow computation is applied to the web crawler, and the SHHD (Simhash Hamming Distance) algorithm is used to dynamically adjust the acquisition period. Experimental results show that information collection based on Storm has the advantages of large throughput and easy updating. The SHHD algorithm can effectively reduce the acquisition system on the network bandwidth and system resources consumption and achieve an adaptive incremental online product review information collection process. SHHD has certain advantages in the lag of product comment information acquisition than Poisson and SART.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号