首页> 外文会议>ACM conference on digital libraries >Semantic Indexing for a Complete Subject Discipline
【24h】

Semantic Indexing for a Complete Subject Discipline

机译:完整主题纪律的语义索引

获取原文

摘要

As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses "semantic indexing" as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositories) by subject, extracted noun phrases from titles and abstracts, then performed semantic indexing on these sub-collections by creating a concept space for each repository. The computation required 2 days on a 128-node SGI/CRAY Origin 2000 at the National Center for Supercomputer Applications (NCSA). This experiment demonstrated the feasibility of scalable semantics techniques for large collections. With the rapid increase in computing power, we believe this indexing technology will shortly be feasible on personal computers.
机译:作为伊利诺伊州数字图书馆倡议(DLI)项目的一部分,我们开发了“可扩展语义”技术。这些统计技术使我们能够为更深入的搜索索引大型集合而不是单词匹配。通过DARPA信息管理计划的主持,我们正在开发一个综合分析环境,Interspace原型,使用“语义索引”作为支持概念导航的基础。这些语义索引记录了名词短语的上下文相关性,并且在常工上计算,独立于主题域。使用此技术,我们能够计算主题纪律的语义索引。特别是,在1998年夏天,我们计算了来自国家医学图书馆(NLM)的9.3M Medline书目记录的概念空间,这在1966年至1997年的时间内广泛地涵盖了生物医学文献。在这项实验中,我们首先分区由主题收集到较小的集合(存储库),从标题和摘要中提取名词短语,然后通过为每个存储库创建概念空间来对这些子集合执行语义索引。在全国超级计算机应用程序(NCSA)的128节点SGI / Cray Origin 2000上需要计算2天。该实验表明了可扩展语义技术用于大型收藏品的可行性。随着计算能力的迅速增加,我们认为这种索引技术在个人计算机上很快就会得到可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号