首页> 外文会议>Twenty-eighth International Conference on Very Large Data Bases, Aug 20-23, 2002, Hong Kong SAR, China >Database Selection Using Actual Physical and Acquired Logical Collection Resources in a Massive Domain-specific Operational Environment
【24h】

Database Selection Using Actual Physical and Acquired Logical Collection Resources in a Massive Domain-specific Operational Environment

机译:在大规模特定于域的操作环境中,使用实际的物理和获得的逻辑集合资源进行数据库选择

获取原文
获取原文并翻译 | 示例

摘要

The continued growth of very large data environments such as Westlaw, Dialog, and the World Wide Web, increases the importance of effective and efficient database selection and searching. Recent research has focused on autonomous and automatic collection selection, searching, and results merging in distributed environments. These studies often rely on TREC data and queries for experimentation. We have extended this work to West's online production environment where thousands of legal, financial and news databases are accessed by up to a quarter-million professional users each day. Using the WIN natural language search engine, a cousin to UMass's IN-QUERY, along with a collection retrieval inference network (CORI) to provide database scoring, we examine the effect that a set of optimized parameters has on database selection performance. We also compare current language modeling techniques to this approach. Traditionally, West's information has been structured over 15,000 online databases, representing roughly 6 terabytes of textual data. Given the expense of running global searches in this environment, it is usually not practical to perform full document retrieval over the entire collection. It is therefore necessary to create a new infrastructure to support automatic database selection in the service of broader searching. In this research, we represent our operational environment in two distinct ways. First, we characterize the underlying physical databases that serve as a foundation for the entire Westlaw search system. Second, we create a rearchitected set of logical document collections that corresponds to classes of high level organizational concepts such as jurisdiction, practice area, and document-type. Keeping the end-user in mind, we focus on performance issues relating to optimal database selection, where domain experts have provided complete pre-hoc relevance judgments for collections characterized under each of our physical and logical database models.
机译:Westlaw,Dialog和World Wide Web等超大型数据环境的持续增长,提高了有效选择数据库和搜索的重要性。最近的研究集中在分布式环境中的自主和自动集合选择,搜索以及结果合并。这些研究通常依靠TREC数据和查询进行实验。我们已经将这项工作扩展到了West的在线生产环境,每天有多达25万的专业用户访问成千上万的法律,财务和新闻数据库。使用WIN自然语言搜索引擎(UMass的IN-QUERY的一个表亲)以及集合检索推断网络(CORI)提供数据库评分,我们检查了一组优化参数对数据库选择性能的影响。我们还将当前的语言建模技术与此方法进行比较。传统上,West的信息由15,000多个在线数据库构成,代表大约6 TB的文本数据。考虑到在此环境中运行全局搜索的开销,通常无法在整个集合中执行完整的文档检索。因此,有必要创建一个新的基础结构来支持自动数据库选择,以进行更广泛的搜索。在这项研究中,我们以两种不同的方式表示我们的运营环境。首先,我们描述了作为整个Westlaw搜索系统基础的基础物理数据库。其次,我们创建一组经过重新整理的逻辑文档集合,这些集合对应于高级组织概念的类别,例如权限,业务范围和文档类型。紧记最终用户,我们将重点放在与最佳数据库选择有关的性能问题上,在该问题上,领域专家已为根据我们的每种物理和逻辑数据库模型表征的馆藏提供了完整的事前相关性判断。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号