首页> 美国卫生研究院文献>Frontiers in Neuroinformatics >The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping
【2h】

The GAAIN Entity Mapper: An Active-Learning System for Medical Data Mapping

机译:GAAIN实体映射器:用于医学数据映射的主动学习系统

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This work is focused on mapping biomedical datasets to a common representation, as an integral part of data harmonization for integrated biomedical data access and sharing. We present GEM, an intelligent software assistant for automated data mapping across different datasets or from a dataset to a common data model. The GEM system automates data mapping by providing precise suggestions for data element mappings. It leverages the detailed metadata about elements in associated dataset documentation such as data dictionaries that are typically available with biomedical datasets. It employs unsupervised text mining techniques to determine similarity between data elements and also employs machine-learning classifiers to identify element matches. It further provides an active-learning capability where the process of training the GEM system is optimized. Our experimental evaluations show that the GEM system provides highly accurate data mappings (over 90% accuracy) for real datasets of thousands of data elements each, in the Alzheimer's disease research domain. Further, the effort in training the system for new datasets is also optimized. We are currently employing the GEM system to map Alzheimer's disease datasets from around the globe into a common representation, as part of a global Alzheimer's disease integrated data sharing and analysis network called GAAIN. GEM achieves significantly higher data mapping accuracy for biomedical datasets compared to other state-of-the-art tools for database schema matching that have similar functionality. With the use of active-learning capabilities, the user effort in training the system is minimal.
机译:这项工作的重点是将生物医学数据集映射到一个通用表示形式,作为整合生物医学数据访问和共享的数据协调的组成部分。我们介绍了GEM,它是一种智能软件助手,用于跨不同数据集或从数据集到通用数据模型的自动数据映射。 GEM系统通过为数据元素映射提供精确的建议来自动化数据映射。它利用了有关关联数据集文档中元素的详细元数据,例如生物医学数据集通常提供的数据字典。它采用无监督的文本挖掘技术来确定数据元素之间的相似性,还采用机器学习分类器来识别元素匹配。它还提供了主动学习功能,可以优化训练GEM系统的过程。我们的实验评估表明,GEM系统在阿尔茨海默氏病研究领域中为每个包含数千个数据元素的真实数据集提供了高度准确的数据映射(超过90%的准确性)。此外,还优化了为新数据集训练系统的工作量。我们目前正在使用GEM系统将全球范围内的阿尔茨海默氏病数据集映射到一个通用表示中,作为名为GAAIN 的全球阿尔茨海默氏病综合数据共享和分析网络的一部分。与具有类似功能的其他用于数据库模式匹配的最新技术相比,GEM为生物医学数据集实现了更高的数据映射精度。通过使用主动学习功能,用户在培训系统方面的工作量很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号