首页> 外文期刊>International Journal of UbiComp >A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs
【24h】

A Proposed Architecture for Continuous Web Monitoring Through Online Crawling of Blogs

机译:通过博客的网上爬取进行连续Web监视的建议架构

获取原文
           

摘要

Getting informed of what is registered in the Web space on time, can greatly help the psychologists,marketers and political analysts to familiarize, analyse, make decision and act correctly based on thesociety`s different needs. The great volume of information in the Web space hinders us to continuouslyonline investigate the whole space of the Web. Focusing on the considered blogs limits our working domainand makes the online crawling in the Web space possible. In this article, an architecture is offered whichcontinuously online crawls the related blogs, using focused crawler, and investigates and analyses theobtained data. The online fetching is done based on the latest announcements of the ping server machines.A weighted graph is formed based on targeting the important key phrases, so that a focused crawler can dothe fetching of the complete texts of the related Web pages, based on the weighted graph.
机译:及时了解网络空间中已注册的内容,可以极大地帮助心理学家,市场营销人员和政治分析人员根据学会的不同需求进行熟悉,分析,做出决定并采取正确的行动。 Web空间中的大量信息阻碍了我们连续在线研究Web的整个空间。专注于所考虑的博客限制了我们的工作范围,并使在Web空间中进行在线爬网成为可能。在本文中,提供了一种体系结构,该体系结构使用集中的爬网程序连续地联机爬网相关博客,并调查和分析所获得的数据。在线获取是基于ping服务器计算机的最新公告完成的。基于针对重要关键字的词形成加权图,以便专注的爬虫可以基于以下内容来获取相关网页的完整文本:加权图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号