...
首页> 外文期刊>ACM Transactions on Information Systems >Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers
【24h】

Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers

机译:多愁善感的蜘蛛:利用集中爬行者的意见信息

获取原文
获取原文并翻译 | 示例

摘要

Despite the increased prevalence of sentiment-related information on the Web, there hqs been limited work on focused crawlers capable of effectively collecting not only topic-relevant but also sentiment-relevant content. In this article, we propose a novel focused crawler that incorporates topic and-sentiment information as well as a graph-based tunneling mechanism for enhanced collection of opinion-rich Web content regarding a particular topic. The graph-based sentiment (GBS) crawler uses a text classifier that employs both topic and sentiment categorization modules to assess the relevance of candidate pages. This information is also used to label nodes in web graphs that are employed by the tunneling mechanism to improve collection recall. Experimental results on two test beds revealed that GBS was able to provide better precision and recall than seven comparison crawlers. Moreover, GBS was able to collect a large proportion of the relevant content after traversing far fewer pages than comparison methods. GBS outperformed comparison methods on various categories of Web pages in the test beds, including collection of blogs, Web forums, and social networking Web site content. Further analysis revealed that both the sentiment classification module and graph-based tunneling mechanism played an integral role in the overall effectiveness of the GBS crawler. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—Search process; 1.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search— Graph and tree search strategies; Heuristic methods General Terms: Algorithms, Experimentation, Design, Performance
机译:尽管Web上与情感相关的信息日益普及,但是对于能够有效收集不仅与主题相关的内容而且与情感相关的内容的集中爬网程序的工作还很少。在本文中,我们提出了一种新颖的,集中的爬虫,该爬虫将主题和情感信息以及基于图的隧道机制结合在一起,以增强对特定主题的观点丰富的Web内容的收集。基于图的情感(GBS)搜寻器使用文本分类器,该文本分类器同时使用主题和情感分类模块来评估候选页面的相关性。此信息还用于标记网络图中的节点,隧道机制使用这些节点来改善集合的召回率。在两个测试床上的实验结果表明,GBS可以提供​​比七个比较履带更好的精度和召回率。此外,GBS遍历的页面比比较方法少得多,因此可以收集很大一部分相关内容。 GBS在测试平台上各种类别的网页上的比较方法均胜过比较方法,包括博客,Web论坛和社交网站内容的收集。进一步的分析表明,情感分类模块和基于图的隧道机制都对GBS搜寻器的整体有效性起着不可或缺的作用。类别和主题描述符:H.3.3 [信息存储和检索]:信息搜索和检索-搜索过程; 1.2.8 [人工智能]:问题解决,控制方法和搜索-图形和树形搜索策略;启发式方法通用术语:算法,实验,设计,性能

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号