Focused Crawling with Heterogeneous Semantic Information

机译：重点爬行异构语义信息

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Focused crawlers selectively retrieve Web documents that are relevant to a predefined set of topics. To intelligently make predictions and decisions about relevant URLs and web pages, different topic models have been introduced to represent topic-specific knowledge. Yet it is difficult to support semantic interoperability among different models. Moreover, some manually specified additional semantic information, such as semantic markups and social annotations, could not be effectively used to improve crawling. This paper proposes to boost focused crawling with four kinds of semantic models and semantic information, including thesauruses, categories, ontologies, and folksonomies. A statistical semantic association model is proposed to integrate different semantic models, represent heterogeneous semantic information, and support semantic relevance computation. A focused crawling framework is developed which adopts both keyword based contents and different kinds of additional information for relevance prediction and ranking. Experiments show that the proposed model and framework effectively integrates heterogeneous semantic information for focused crawling.

机译：聚焦爬虫选择性地检索与预定义主题相关的Web文档。为了智能地对相关URL和网页的预测和决策，已经引入了不同的主题模型来表示特定于主题的知识。然而，很难支持不同模型之间的语义互操作性。此外，一些手动指定的其他语义信息，例如语义标记和社会注释，无法有效地用于改善爬网。本文建议促进重点爬行，以四种语义模型和语义信息，包括杂散，分类，本体和愚蠢商。提出了一种统计语义关联模型来集成不同的语义模型，代表异构语义信息，支持语义相关计算。开发了一个聚焦爬行框架，其采用基于关键字的内容和相关的相关性预测和排名的不同类型的信息。实验表明，拟议的模型和框架有效地集成了聚焦爬网的异构语义信息。

著录项

来源
《IEEE/WIC/ACM Joint International Conference on Web Intelligence and Intelligent Agent Technology》|2008年||共7页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Category; Focused Crawling; Folksonomy; Ontology; Semantic Web; Thesaurus;

机译：类别;聚焦爬行;愚蠢的;本体;语义网络;词库;

相似文献

外文文献
中文文献
专利

1. Focused Crawling Based Upon TF-IDF Semantics and Hub Score Learning [J] . Mukesh Kumar, Renu Vig Journal of Emerging Technologies in Web Intelligence . 2013,第1期

机译：基于TF-IDF语义和中心得分学习的集中爬网
2. Hybrid Focused Crawling Based Upon VSM Similarity, WordNet Semantics and Hub Score Learning [J] . Mukesh Kumar, Renu Vig International Journal of Information and Management Sciences . 2013,第3期

机译：基于VSM相似度，WordNet语义和集线器分数学习的混合聚焦爬网
3. Towards Open Decision Support Systems Based On Semantic Focused Crawling [J] . Jason J.Jung Expert systems with applications . 2009,第2p2期

机译：面向基于语义爬行的开放决策支持系统
4. Focused Crawling with Heterogeneous Semantic Information [C] . Huang Rui, Lin Fen, Shi Zhongzhi IEEE/WIC/ACM Joint International Conference on Web Intelligence and Intelligent Agent Technology . 2008

机译：重点爬行异构语义信息
5. A novel hybrid focused crawling algorithm to build domain-specific collections. [D] . Chen, Yuxin. 2007

机译：一种新颖的混合重点爬网算法，用于构建特定于域的集合。
6. Domain adaptation of statistical machine translation with domain-focused web crawling [O] . Pavel Pecina, Antonio Toral, Vassilis Papavassiliou, -1

机译：统计机器翻译的领域适应和以领域为中心的网络爬网
7. Semantic Focused Crawling for Retrieving E- Commerce Information [O] . Wei Huang, Liyi Zhang Jidong Zhang 2014

机译：用于检索电子商务信息的语义聚焦爬行
8. Focused Crawling of the Deep Web Using Service Class Descriptions [R] . Rocco, D., Liu, L., Critchlow, T. 2005

机译：使用服务类描述重点对Deep Web进行爬网

Focused Crawling with Heterogeneous Semantic Information

摘要

著录项

相似文献

相关主题

期刊订阅