首页> 外文会议>International Conference on Natural Language Processing and Chinese Computing >Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books
【24h】

Incorporating Lexicon for Named Entity Recognition of Traditional Chinese Medicine Books

机译:将Lexicon纳入名为实体识别的中药书籍

获取原文

摘要

Little research has been done on the Named Entity Recognition (NER) of Traditional Chinese Medicine (TCM) books and most of them use statistical models such as Conditional Random Fields (CRFs). However, in these methods, lexicon information and large-scale of unlabeled corpus data are not fully exploited. In order to improve the performance of NER for TCM books, we propose a method which is based on biLSTM-CRF model and can incorporate lexicon information into representation layer to enrich its semantic information. We compared our approach with several previous character-based and word-based methods. Experiments on "Shanghan Lun" dataset show that our method outperforms previous models. In addition, we collected 376 TCM books to construct a large-scale of corpus to obtain the pre-trained vectors since there is no large available corpus in this field before. We have released the corpus and pre-trained vectors to the public.
机译:在中医(TCM)书籍(TCM)书籍(TCM)书籍(TCM)的命名实体识别(NER)以及大多数使用统计模型(CRF)等大多数研究已经完成了一点的研究。但是,在这些方法中,Lexicon信息和大规模的未标记语料库数据没有完全利用。为了提高NER对​​TCM书籍的性能,我们提出了一种基于Bilstm-CRF模型的方法,可以将词典信息纳入表示层以丰富其语义信息。我们将我们的方法与基于几种基于字符和基于Word的方法进行了比较。 “上海伦”数据集的实验表明,我们的方法优于以前的型号。此外,我们收集了376个TCM书籍,构建大规模的语料库,以获得预先训练的矢量,因为此前没有在该领域中没有大型可用语料库。我们已发布对公众的语料库和预先训练的载体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号