...
首页> 外文期刊>Distributed and Parallel Databases >Concept acquisition and improved in-database similarity analysis for medical data
【24h】

Concept acquisition and improved in-database similarity analysis for medical data

机译:医疗数据的概念采集和改进的数据库内相似性分析

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Efficient identification of cohorts of similar patients is a major precondition for personalized medicine. In order to train prediction models on a given medical data set, similarities have to be calculated for every pair of patientswhich results in a roughly quadratic data blowup. In this paper we discuss the topic of in-database patient similarity analysis ranging from data extraction to implementing and optimizing the similarity calculations in SQL. In particular, we introduce the notion of chunking that uniformly distributes the workload among the individual similarity calculations. Our benchmark comprises the application of one similarity measures (Cosine similariy) and one distance metric (Euclidean distance) on two real-world data sets; it compares the performance of a column store (MonetDB) and a row store (PostgreSQL) with two external data mining tools (ELKI and Apache Mahout).
机译:高效识别类似患者的群组是个性化医学的主要前提。为了在给定的医疗数据集上训练预测模型,必须针对每对患者计算相似性,其中导致大致二次数据吹气。在本文中,我们讨论了从数据提取到实现和优化SQL的相似性计算的数据库中患者相似性分析的主题。特别是,我们介绍了统一分配各个相似性计算的堆积的概念。我们的基准测试包括在两个真实数据集上应用一个相似度量(余弦相似)和一个距离度量(欧几里德距离);它比较了列商店(MONETDB)和行商店(POSTGRESQL)的性能与两个外部数据挖掘工具(ELKI和Apache MAHOUT)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号