【24h】

Gathering Services of IHWA from Semi-Structured Web Information Sources

机译:从半结构化Web信息源收集IHWA的服务

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Information Harvest WArehouse (IHWA) is a web-based information search system. It is designed using the Component Based Software Engineering (CBSE) paradigm, where applications are to be developed by integrating various software components In this paper, we describe the development of the meta-information gathering service of IHWA (Meta Gatherer), which collects and extracts information from semi-structured or unstructured data sources. Focus is on the development of the information extraction service of the gatherer from semi-structured (DTD-unknown XML data) Internet information sources. The information extraction module implemented provides clean Java program interfaces, so that it can be easily integrated with other applications. Its implementation is an efficient one as well, since it analyzes a source XML file in one path, where most other systems use two paths approach.
机译:信息收集仓库(IHWA)是基于Web的信息搜索系统。它是使用基于组件的软件工程(CBSE)范例进行设计的,其中将通过集成各种软件组件来开发应用程序。在本文中,我们描述了IHWA(Meta Gatherer)的元信息收集服务的开发,该服务收集并从半结构化或非结构化数据源中提取信息。重点是从半结构化(DTD未知的XML数据)Internet信息源开发收集器的信息提取服务。实施的信息提取模块提供了干净的Java程序接口,因此可以轻松地与其他应用程序集成。它的实现也是一种高效的方法,因为它可以在一个路径中分析源XML文件,而大多数其他系统则使用两个路径方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号