首页> 外文会议>International Conference on Network and Information Systems for Computers >The Data Crawling and Hotspot Analyze of Social QA Site
【24h】

The Data Crawling and Hotspot Analyze of Social QA Site

机译:社会问答网站的数据爬网和热点分析

获取原文

摘要

Along with the rapid development of the Internet, more specialized and detailed information sources like Q&A sites have gradually come into being. On these social Q&A platforms, there are plenty of hot topics and news being discussed and even created every minute. Therefore, it is of great practical significance to learn about hot social issues by analyzing and parsing the content on social Q&A platforms. By taking a social Q&A platform as the research subject, this paper analyzes the difficulties in crawling data from this platform and relevant solutions, designs and implement a data crawling system containing a user information storage module, a highly anonymous and available proxy maintenance module, a node crawling and parsing module, and a data storage module. With these modules, the system is able to crawl data and store it without being restricted by the platform. On this basis, this paper designs and implements a hotspot parsing and grading module. Based on echarts, a historical hotspot display module and a trending hotspot display module are designed to show the historical and trending hotspots on this platform. Then, this paper uses the proposed data crawling module and the hotspot analysis and display system to obtain the data of 31,520 regularized independent topics and the real-time data of 979,815 questions from this social Q&A platform. Based on these data, the historical and trending hotspot analysis on this platform is displayed. The experimental results show that this system has fully met the design objectives. Finally, this research summarizes the proposed data crawling and hotspot analysis system and provides reference and directions for future work.
机译:随着互联网的快速发展,如Q&A网站的更多专业化和详细的信息来源逐渐变成。在这些社交问答平台上,有很多热门话题和新闻正在讨论,甚至每分钟创造。因此,通过分析和解析社会问答平台上的内容,了解热门社会问题具有很大的实际意义。通过将社交Q&A平台作为研究主题,分析了来自该平台和相关解决方案的爬行数据的困难,设计和实施包含用户信息存储模块,高度匿名和可用的代理维护模块的数据爬网系统节点爬网和解析模块,以及数据存储模块。使用这些模块,系统能够爬网数据并将其存储在不受平台的情况下。在此基础上,本文设计并实现了热点解析和分级模块。基于ECHART,历史热点显示模块和趋势热点显示模块旨在显示该平台上的历史和趋势热点。然后,本文采用所提出的数据爬行模块和热点分析和显示系统,以获得31,520个正则化的独立主题和该社交Q&A平台的979,815个问题的实时数据。基于这些数据,显示了对该平台的历史和趋势热点分析。实验结果表明,该系统完全符合了设计目标。最后,本研究总结了所提出的数据爬行和热点分析系统,并为未来工作提供参考和方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号