首页> 外文期刊>Cluster computing >CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework
【24h】

CRFs based parallel biomedical named entity recognition algorithm employing MapReduce framework

机译:使用MapReduce框架的基于CRF的并行生物医学命名实体识别算法

获取原文
获取原文并翻译 | 示例
       

摘要

As the rapid growth of the biomedical literature, the model training time in biomedical named entity recognition increases sharply when dealing with large-scale training samples. How to increase the efficiency of named entity recognition in biomedical big data becomes one of the key problems in biomedical text mining. For the purposes of improving the recognition performance and reducing the training time, this paper proposes an optimization method for two-phase recognition using conditional random fields. In the first stage, each named entity boundary is detected to distinguish all real entities. In the second stage, we label the semantic class of the entity detected. To expedite the training speed, in these two phases, we implement the model training process on a parallel optimization program framework based on MapReduce. Through dividing the training set into several parts, the iterations in the training algorithm are designed as map tasks which can be executed simultaneously in a cluster, where each map function is designed to complete the calculation of a gradient vector component for each part in the training set. Our experiments show that the proposed method in this paper can achieve high performance with short training time, which has important implications for the current biological big data processing.
机译:随着生物医学文献的迅速发展,在处理大规模训练样本时,生物医学命名实体识别中的模型训练时间急剧增加。如何提高生物医学大数据中命名实体的识别效率成为生物医学文本挖掘的关键问题之一。为了提高识别性能,减少训练时间,提出了一种利用条件随机场的两相识别优化方法。在第一阶段,检测每个命名实体边界以区分所有真实实体。在第二阶段,我们标记检测到的实体的语义类别。为了加快训练速度,在这两个阶段中,我们在基于MapReduce的并行优化程序框架上实现了模型训练过程。通过将训练集划分为多个部分,训练算法中的迭代被设计为地图任务,可以在集群中同时执行,其中每个地图函数均设计为完成训练中每个部分的梯度矢量分量的计算组。实验表明,本文提出的方法可以在较短的训练时间内实现较高的性能,这对当前的生物大数据处理具有重要意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号