首页> 外文会议>International Conference on Advances in Computing, Communications and Informatics >SemCrawl: Framework for Crawling Ontology Annotated Web Documents for Intelligent Information Retrieval
【24h】

SemCrawl: Framework for Crawling Ontology Annotated Web Documents for Intelligent Information Retrieval

机译:Semcrawl:用于爬行本体的框架注释了智能信息检索的Web文档

获取原文

摘要

Web is considered as the largest information pool and search engine, a tool for extracting information from web, but due to unorganized structure of the web it is getting difficult to use search engine tool for finding relevant information from the web. Future search engine tools will not be based merely on keyword search, whereas they will be able to interpret the meaning of the web contents to produce relevant results. Design of such tools requires extracting information from the contents which supports logic and inferential capability. This paper discusses the conceptual differences between the traditional web and semantic web, specifying the need for crawling semantic web documents. In this paper a framework is proposed for crawling the ontologies/semantic web documents. The proposed framework is implemented and validated on different collection of web pages. This system has features of extracting heterogeneous documents from the web, filtering the ontology annotated web pages and extracting triples from them which supports better inferential capability.
机译:Web被认为是最大的信息池和搜索引擎,一种用于从Web中提取信息的工具,而是由于网络的未经用化结构,它难以使用搜索引擎工具来查找来自Web的相关信息。未来的搜索引擎工具不仅仅基于关键字搜索,而他们将能够解释Web内容的含义以产生相关结果。这些工具的设计需要从支持逻辑和推动能力的内容中提取信息。本文讨论了传统的Web和语义Web之间的概念差异,指定了对爬行语义Web文档的需求。在本文中,提出了一个框架,用于爬行本体/语义Web文件。在不同的网页集合上实现并验证了所提出的框架。该系统具有从Web中提取异构文档的特点,过滤Intology注释的网页并从它们中提取三倍,支持更好的推动能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号