首页> 外文学位 >An algebraic foundation for automatic semantic data integration on the hidden Web.
【24h】

An algebraic foundation for automatic semantic data integration on the hidden Web.

机译:在隐藏的Web上自动进行语义数据集成的代数基础。

获取原文
获取原文并翻译 | 示例

摘要

Semantic integration of the hidden Web is an emerging area of research where traditional assumptions about schema do not always hold and semantic heterogeneity poses serious challenge. Constant changes, conflicts and sheer size in the world of hidden Web demand integration techniques that rely on autonomous detection and resolution heterogeneity, correspondence establishment and information extraction strategies. First it needs to automate those techniques and then to integrate those techniques or sub-systems automatically into a single system. Though many such sub-systems have been automated, to our knowledge, there is no integrated framework for combining those technologies automatically. Our idea is to exploit the flexibility and strengths of a declarative language and the first step of such a language is to give an algebraic foundation that takes various integration techniques into consideration. In this thesis, we present an algebraic language, called Integra, as a foundation for an SQL like query language such as BioFlow for the integration of Life Sciences data on the hidden Web. The algebra presented here assumes that all web pages can be thought of as traditional relations and the integration techniques can be considered as user defined functions. These assumptions make it possible for us to extend the traditional relational algebra to include integration primitives such that a database with traditional relations reduces to a special case in our model. The algebra relies on a schema matching function mu, a key discovery function k, a wrapper or extraction function eta and two new operators link and combine that embody the well known concepts of horizontal and vertical integration.
机译:隐藏Web的语义集成是一个新兴的研究领域,其中关于模式的传统假设并不总是成立,语义异质性提出了严峻的挑战。在隐藏Web领域中,不断变化,冲突和规模庞大,需要依靠自主检测和解决方案异构性,对应关系建立和信息提取策略的集成技术。首先,它需要使这些技术自动化,然后将这些技术或子系统自动集成到单个系统中。尽管许多此类子系统已实现自动化,但据我们所知,尚无用于自动组合这些技术的集成框架。我们的想法是利用声明性语言的灵活性和优势,而这种语言的第一步是提供考虑各种集成技术的代数基础。在本文中,我们提出一种称为Integra的代数语言,作为SQL之类的查询语言(例如BioFlow)的基础,以将生命科学数据集成到隐藏的Web上。这里介绍的代数假定所有网页都可以视为传统关系,而集成技术则可以视为用户定义的功能。这些假设使我们可以将传统的关系代数扩展为包括积分原语,从而使具有传统关系的数据库在我们的模型中简化为特殊情况。代数依赖于模式匹配函数mu,密钥发现函数k,包装器或提取函数eta以及两个新的运算符进行链接和组合,这些组合体现了众所周知的水平和垂直集成概念。

著录项

  • 作者

    Hosain, Md. Shazzad.;

  • 作者单位

    Wayne State University.;

  • 授予单位 Wayne State University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 120 p.
  • 总页数 120
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号