首页> 外文OA文献 >Reducing semantic complexity in distributed Digital Libraries: treatment of term vagueness and document re-ranking
【2h】

Reducing semantic complexity in distributed Digital Libraries: treatment of term vagueness and document re-ranking

机译:降低分布式数字图书馆中的语义复杂性:术语模糊性和文档重新排序

摘要

Purpose - The general science portal vascoda merges structured, high-quality information collections from more than 40 providers on the basis of search engine technology (FAST) and a concept which treats semantic heterogeneity between different controlled vocabularies. First experiences with the portal show some weaknesses of this approach which come out in most metadata-driven Digital Libraries (DL) or subject specific portals. The purpose of the paper is to propose models to reduce the semantic complexity in heterogeneous DLs. The aim is to introduce value-added services (treatment of term vagueness and document re-ranking) that gain a certain quality in DLs if they are combined with heterogeneity components established in the project “Competence Center Modeling and Treatment of Semantic Heterogeneity”.Design/methodology/approach - First, semantic heterogeneity components translate automatically between different indexing languages. This approach will have an impact on search in a scenario when the searcher uses controlled vocabularies which are cross-linked with cross-concordances. However, users usually formulate query terms freely without any vocabulary support. Empirical observations show that freely formulated user terms and terms from controlled vocabularies are often not the same or match just by coincidence. Therefore, a value-added service will be developed which rephrases the natural language searcher terms into suggestions from the controlled vocabulary, the Search Term Recommender (STR). Second, the result sets of transformed or expanded queries in distributed collections are often very large and tests show that the conventional web-based ranking methods are not appropriate for presenting heterogeneous metadata records as suitable result sets to the user. Therefore, two methods, which are derived from scientometrics and network analysis, will be implemented with the objective to re-rank result sets by the following structural properties: the ranking of the results by core journals (so-called Bradfordizing) and ranking by centrality of authors in co-authorship networks.Findings - The methods, which will be implemented, focus on the query and on the result side of a search and are designed to positively influence each other. Conceptually they will improve the search quality and guarantee that the most relevant documents in result sets will be ranked higher.Originality/value - The central impact of the paper focuses on the integration of three structural value-adding methods which aim at reducing the semantic complexity represented in distributed DLs at several stages in the information retrieval process: query construction, search and ranking, and re-ranking.Paper type - Research paper
机译:目的-通用科学门户网站vascoda基于搜索引擎技术(FAST)和可处理不同受控词汇之间语义异质性的概念,合并了来自40多个提供商的结构化,高质量信息集合。门户的初步经验表明,这种方法存在一些弱点,这种弱点在大多数元数据驱动的数字图书馆(DL)或特定主题的门户中都有。本文的目的是提出减少异构DL中语义复杂度的模型。其目的是引入增值服务(术语模糊性处理和文档重新排名),如果这些增值服务与“能力中心建模和语义异质性处理”项目中建立的异质性组件相结合,则可以在DL中获得一定的质量。设计/ methodology / approach-首先,语义异构组件在不同的索引语言之间自动翻译。当搜索者使用与交叉一致性交叉链接的受控词汇时,这种方法将对方案产生影响。但是,用户通常在没有任何词汇支持的情况下自由地制定查询词。经验观察表明,自由制定的用户术语与受控词汇中的术语常常不相同,或者只是巧合而匹配。因此,将开发一种增值服务,该服务将自然语言搜索者的用词改写为受控词汇“搜索词推荐者”(STR)的建议。其次,分布式集合中经过转换或扩展的查询的结果集通常非常大,测试表明,常规的基于Web的排名方法不适合将异构元数据记录作为合适的结果集呈现给用户。因此,将采用两种从科学计量学和网络分析中得出的方法,其目的是通过以下结构属性对结果集进行重新排序:按核心期刊对结果进行排名(所谓的布拉德福德化)和对中心性进行排名发现-将要实施的方法着重于查询和搜索的结果侧,并旨在相互影响。从概念上讲,它们将提高搜索质量,并确保结果集中最相关的文档的排名更高。原始性/价值-本文的核心影响集中在三种结构性增值方法的集成上,这些方法旨在降低语义复杂性在信息检索过程的多个阶段以分布式DL表示:查询结构,搜索和排名以及重新排名。论文类型-研究论文

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号