首页> 外文期刊>The Journal of Documentation >Augmenting Dublin Core digital library metadata with Dewey Decimal Classification
【24h】

Augmenting Dublin Core digital library metadata with Dewey Decimal Classification

机译:用杜威十进制分类法增强都柏林核心数字图书馆元数据

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Purpose - The purpose of this paper is to describe a new approach to a well-known problem for digital libraries, how to search across multiple unrelated libraries with a single query. Design/methodology/approach - The approach involves creating new Dewey Decimal Classification terms and numbers from existing Dublin Core records. In total, 263,550 records were harvested from three digital libraries. Weighted key terms were extracted from the title, description and subject fields of each record. Ranked DDC classes were automatically generated from these key terms by considering DDC hierarchies via a series of filtering and aggregation stages. A mean reciprocal ranking evaluation compared a sample of 49 generated classes against DDC classes created by a trained librarian for the same records. Findings - The best results combined weighted key terms from the title, description and subject fields. Performance declines with increased specificity of DDC level. The results compare favorably with similar studies. Research limitations/implications - The metadata harvest required manual intervention and the evaluation was resource intensive. Future research will look at evaluation methodologies that take account of issues of consistency and ecological validity. Practical implications - The method does not require training data and is easily scalable. The pipeline can be customized for individual use cases, for example, recall or precision enhancing. Social implications - The approach can provide centralized access to information from multiple domains currently provided by individual digital libraries. Originality/value - The approach addresses metadata normalization in the context of web resources. The automatic classification approach accounts for matches within hierarchies, aggregating lower level matches to broader parents and thus approximates the practices of a human cataloger.
机译:目的-本文的目的是描述一种解决数字图书馆众所周知问题的新方法,即如何通过单个查询在多个不相关的图书馆中进行搜索。设计/方法/方法-该方法涉及根据现有都柏林核心记录创建新的杜威十进制分类术语和编号。共有来自三个数字图书馆的263,550条记录。从每个记录的标题,描述和主题字段中提取了加权的关键术语。通过一系列过滤和聚合阶段,通过考虑DDC层次结构,可以从这些关键术语自动生成排名DDC类。平均倒数排名评估将49个生成的类的样本与受过训练的图书馆员为相同记录创建的DDC类进行了比较。结果-最佳结果结合了标题,描述和主题字段中的加权关键术语。随着DDC水平特异性的提高,性能下降。结果与类似研究相比具有优势。研究的局限性/意义-元数据收集需要人工干预,并且评估需要大量资源。未来的研究将着眼于评估方法,这些方法应考虑一致性和生态有效性的问题。实际意义-该方法不需要培训数据,并且易于扩展。可以针对个别用例(例如,召回或提高精度)定制管道。社会影响-该方法可以集中访问单个数字图书馆当前提供的来自多个域的信息。创意/价值-该方法解决了Web资源上下文中的元数据标准化问题。自动分类方法考虑了层次结构中的匹配项,将较低级别的匹配项汇总到更广泛的父项,从而近似了人类编目员的做法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号