首页> 外文会议>International conference on information technology: new generations >High-Performance Biomedical Association Mining with MapReduce
【24h】

High-Performance Biomedical Association Mining with MapReduce

机译:MapReduce的高​​性能生物医学协会挖掘

获取原文

摘要

MapReduce has been applied to data-intensive applications in different domains because of its simplicity, scalability and fault-tolerance. However, its uses in biomedical association mining are still very limited. In this paper, we investigate using MapReduce to efficiently mine the associations between biomedical terms extracted from a set of biomedical articles. First, biomedical terms were obtained by matching text to Unified Medical Language System (UMLS) Metathesaurus, a biomedical vocabulary and standard database. Then we developed a MapReduce algorithm that could be used to calculate a category of interestingness measures defined on the basis of a 2×2 contingency table. This algorithm consists of two MapReduce jobs and takes a stripes approach to reduce the number of intermediate results. Experiments were conducted using Amazon Elastic MapReduce (EMR) with an input of 3610 articles retrieved from two biomedical journals. Test results indicate that our algorithm has linear scalability.
机译:由于其简单性,可扩展性和容错性,MapReduce已应用于不同域中的数据密集型应用。然而,其在生物医学协会采矿中的用途仍然非常有限。在本文中,我们使用Mapreduce进行调查,以有效地挖掘从一组生物医学制品中提取的生物医学术语之间的关联。首先,通过将文本匹配到统一的医疗语言系统(UMLS)Metathesaurus,生物医学词汇和标准数据库来获得生物医学术语。然后,我们开发了一个MapReduce算法,可用于计算基于2×2次次争夺表定义的有趣措施的类别。该算法由两个MapReduce作业组成,并采用条纹方法来减少中间结果的数量。使用Amazon Elastic Mapreduce(EMR)进行了实验,输入了从两个生物医学期刊中检索的3610篇文章。测试结果表明,我们的算法具有线性可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号