首页> 外文会议>International conference on information technology: new generations >High-Performance Biomedical Association Mining with MapReduce
【24h】

High-Performance Biomedical Association Mining with MapReduce

机译:使用MapReduce进行高性能生物医学协会挖掘

获取原文

摘要

MapReduce has been applied to data-intensive applications in different domains because of its simplicity, scalability and fault-tolerance. However, its uses in biomedical association mining are still very limited. In this paper, we investigate using MapReduce to efficiently mine the associations between biomedical terms extracted from a set of biomedical articles. First, biomedical terms were obtained by matching text to Unified Medical Language System (UMLS) Metathesaurus, a biomedical vocabulary and standard database. Then we developed a MapReduce algorithm that could be used to calculate a category of interestingness measures defined on the basis of a 2×2 contingency table. This algorithm consists of two MapReduce jobs and takes a stripes approach to reduce the number of intermediate results. Experiments were conducted using Amazon Elastic MapReduce (EMR) with an input of 3610 articles retrieved from two biomedical journals. Test results indicate that our algorithm has linear scalability.
机译:由于MapReduce的简单性,可伸缩性和容错性,已被应用于不同领域的数据密集型应用程序。但是,它在生物医学联合开采中的用途仍然非常有限。在本文中,我们研究使用MapReduce有效挖掘从一组生物医学文章中提取的生物医学术语之间的关联。首先,通过将文本与统一医学语言系统(UMLS)Metathesaurus(生物医学词汇和标准数据库)进行匹配来获得生物医学术语。然后,我们开发了MapReduce算法,该算法可用于计算基于2×2列联表定义的兴趣度类别。该算法由两个MapReduce作业组成,并采用条带化方法来减少中间结果的数量。使用Amazon Elastic MapReduce(EMR)进行了实验,输入了从两种生物医学期刊中检索到的3610篇文章。测试结果表明我们的算法具有线性可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号