首页> 外文会议>2013 IEEE International Conference on Big Data >A case study on entity Resolution for Distant Processing of big Humanities data
【24h】

A case study on entity Resolution for Distant Processing of big Humanities data

机译:大型人文数据远程处理的实体解析案例研究

获取原文
获取原文并翻译 | 示例

摘要

At the forefront of big data in the Humanities, collections management can directly impact collections access and reuse. However, curators using traditional data management methods for tasks such as identifying redundant from relevant and related records, a small increase in data volume can significantly increase their workload. In this paper, we present preliminary work aimed at assisting curators in making important data management decisions for organizing and improving the overall quality of large unstructured Humanities data collections. Using Entity Resolution as a conceptual framework, we created a similarity model that compares directories and files based on their implicit metadata, and clusters pairs of closely related directories. Useful relationships between data are identified and presented through a graphical user interface that allows qualitative evaluation of the clusters and provides a guide to decide on data management actions. To evaluate the model's performance, we experimented with a test collection and asked the curator to classify the clusters according to four model cluster configurations that consider the presence of related and duplicate information. Evaluation results suggest that the model is useful for making data management action decisions.
机译:在人文科学大数据的最前沿,馆藏管理可以直接影响馆藏的访问和重用。但是,策展人将传统的数据管理方法用于诸如从相关记录和相关记录中识别冗余之类的任务时,数据量的少量增加会显着增加其工作量。在本文中,我们目前正在进行初步工作,旨在帮助策展人做出重要的数据管理决策,以组织和提高大型非结构化人文数据收集的整体质量。使用实体解析作为概念框架,我们创建了一个相似性模型,该模型基于目录和文件的隐式元数据比较目录和文件,并对紧密相关的目录对进行聚类。数据之间的有用关系通过图形用户界面进行标识和呈现,该图形用户界面允许对群集进行定性评估,并为确定数据管理操作提供指南。为了评估模型的性能,我们尝试了一个测试集合,并要求策展人根据考虑相关和重复信息存在的四个模型集群配置对集群进行分类。评估结果表明,该模型对于制定数据管理措施决策很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号