首页> 外文期刊>电子学报:英文版 >Ontology-Based Automatically Hidden Web Portal Index
【24h】

Ontology-Based Automatically Hidden Web Portal Index

机译:基于本体的自动隐藏Web门户索引

获取原文
获取原文并翻译 | 示例
           

摘要

Many valuable databases on the Web have non-crawlable contents that are “hidden” behind the search forms. Information is available only by filling out HTML forms manually to query the underlying databases. For accessing data behind forms by automated agents, the critical task is having the corresponding query interfaces of the hidden databases that can be understood by machine. This paper presents an automatic approach of hidden Web portal index for various domains. It discovers and scrapes the query forms from Web pages based the tag-tree presentation, and then interpret them into the uniform mediate interfaces with the aid of domain ontology definition. To achieve high transformation accuracy, the domain ontology is also used to filter out the interfaces that are not related to the specific domain. The query interfaces gained finally represented with common concepts can automatically be indexed and retrieved by program. The experiments indicate that the algorithms used are efficient and the system is materially useful for information system or personalized Web search system to retrieval contents from hidden Web.
机译:Web上许多有价值的数据库都有不可检索的内容,这些内容“隐藏”在搜索表单的后面。仅通过手动填写HTML表单以查询基础数据库才能获得信息。为了通过自动化代理访问表单背后的数据,关键任务是使隐藏数据库具有相应的查询接口,机器可以理解这些接口。本文提出了一种针对各个领域的隐藏Web门户索引的自动方法。它基于标记树表示从网页中发现并刮取查询表单,然后借助域本体定义将它们解释为统一的中介接口。为了获得较高的转换精度,还使用域本体来过滤掉与特定域无关的接口。最终以通用概念表示的查询接口可以由程序自动索引和检索。实验表明所使用的算法是有效的,并且该系统对于信息系统或个性化Web搜索系统从隐藏的Web检索内容具有实质性的帮助。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号