【24h】

Automatic Conversion from MARC to FRBR

机译:从MARC自动转换为FRBR

获取原文

摘要

We applied our prototype system to a set of 4379 BIBSYS records that are related to Henrik Ibsen. These records often describe complex works, containing multiple expressions and/or parts of other works. We chose them because singlecelular works are often easy to convert and studies "[...] suggest that the majority of benefits associated with applying FRBR [...] could be obtained by concentrating on a relatively small number of complex works." [8]. When putting the system into a strict mode that relies on the certain identification of the original title, we identified 48 works by Henrik Ibsen. Of these works 16 were falsely identified because of varying or erroneous spelling, 10 were collections and translations that the system failed to recognize as such. These works could be clustered into 1111 expressions, which are contained in 1072 manifestations, 35 of which contained more than one expression. But due to the strict setting of the rules 3307 records where ignored, because it could not be determined, whether they contain an original title or not. Without this knowledge it is impossible to reliably identify the work cluster. With non-strict settings our system identified 580 works by Ibsen and 3706 expressions in 3567 manifestations, clearly no satisfying result. The next steps are to apply a more fault tolerant dissimilarity measure to the clustering process, and to use authority files in the attribute layer in oder to cope with spelling variations and errors. In addition, we will try to leverage the reliable information found in high quality records to process incomplete records. It is planned to apply the system to a 100.000 records large subset of the BIBSYS catalog and to evaluate the results.
机译:我们将原型系统应用于与Henrik Ibsen相关的一组4379个Bibsys记录。这些记录通常会描述复杂的作品,包含多个表达式和/或其他作品的部分。我们选择了它们,因为单一的作品往往很容易转换和研究“[...]建议通过集中在相对较少数量的复杂作品上获得与应用FRBR的大部分益处。” [8]。将系统放入依赖于原始标题的某些识别的严格模式时,我们确定了Henrik IBSEN的48份。在这些作品16中由于不同或错误的拼写而被错误地识别,10是系统未能识别的集合和翻译。这些作品可以集聚集到1111个表达式中,其中包含在1072个表现形式中,其中35个包含多种表达式。但由于规则的严格设置3307记录忽略的记录,因为无法确定它们是否包含原始标题。如果没有这种知识,不可能可靠地识别工作群集。通过非严格设置,我们的系统确定了580由IBSEN和3706表现在3567个表现形式的作品,显然没有满足结果。下一步是对聚类过程应用更容错的不相似度量,并在over中使用权限文件来应对拼写变化和错误。此外,我们将尽力利用高质量记录中发现的可靠信息来处理不完整的记录。计划将系统应用于100.000录制Bibsys目录的大小子集并评估结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号