首页> 外文OA文献 >Architectures enabling scalable Internet search
【2h】

Architectures enabling scalable Internet search

机译:支持可扩展Internet搜索的体系结构

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The vast amount of Internet content becomes manageable mainly by means of search engines that allow users to enter queries into a web form and receive as result a list of matches that refer to Intenet content elements, such as the URLs identifying matching HTML pages. However, the quality of these search engines suffers from two conceptual problems. The content volume grows faster than the bandwidth available to index it, and a large and growing share is ``hidden'' in the {em deep web}, e.g. behind HTML forms, making it hard to reach and index by search engines. The work presented here shows that these problems can be overcome if the paradigm of Internet search is reversed: content providers have to assist in making their content searchable. This leads to a distributed architecture that scales better than the central approach that current search engines implement, and that makes the deep web searchable. A UML model of the distributed search architecture was created and then implemented using Java, verifying the feasibility of the concepts. The scalability of the solution was proven using a formal model of the bandwidth consumed by a specific class of distributed search algorithms, as used by the suggested architecture. The remaining problem of how to create the content so that it complies with the suggested search architecture was tackled in two ways. Adapters for existing content can be created with little effort, as has been shown by a prototype. New Internet applications can be made searchable using the Model-Driven Architecture approach as introduced by the Object Management Group. A metamodel with a corresponding UML profile was defined that allows for a compact specification of an application's searchability. Using model transformations, a large share of the code that implements the specified searchability can be generated automatically from the models expressed in this metamodel.
机译:大量的Internet内容主要通过搜索引擎变得可管理,这些搜索引擎允许用户将查询输入到Web表单中并作为结果接收引用Intenet内容元素的匹配列表,例如标识匹配HTML页面的URL。但是,这些搜索引擎的质量存在两个概念性问题。内容量的增长速度超过了可用于对其进行索引的带宽的速度,并且{em deep web}中隐藏着越来越大的份额,例如HTML表单后面的内容,使搜索引擎难以访问和建立索引。本文介绍的工作表明,如果反转Internet搜索范式,则可以解决这些问题:内容提供者必须协助使其内容可搜索。这导致分布式架构的可扩展性比当前搜索引擎实现的中心方法更好,并且使深层网络可搜索。创建了分布式搜索体系结构的UML模型,然后使用Java实现了该模型,从而验证了这些概念的可行性。使用建议的体系结构所使用的特定类别的分布式搜索算法消耗的带宽的形式模型,证明了该解决方案的可伸缩性。如何创建内容以使其符合建议的搜索体系结构的剩余问题已通过两种方式解决。如原型所示,可以轻松创建现有内容的适配器。使用对象管理小组介绍的模型驱动体系结构方法,可以使新的Internet应用程序可搜索。定义了具有相应UML概要文件的元模型,该模型允许对应用程序的可搜索性进行紧凑的规范。使用模型转换,可以从此元模型中表达的模型自动生成实现指定可搜索性的大部分代码。

著录项

  • 作者

    Uhl Axel;

  • 作者单位
  • 年度 2004
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号