首页> 外文期刊>Journal of the American Medical Informatics Association : >Traditional Chinese medicine clinical records classification with BERT and domain specific corpora
【24h】

Traditional Chinese medicine clinical records classification with BERT and domain specific corpora

机译:中药临床记录与伯特和域特定的语料库分类

获取原文
获取原文并翻译 | 示例
           

摘要

Traditional Chinese Medicine (TCM) has been developed for several thousand years and plays a significant role in health care for Chinese people. This paper studies the problem of classifying TCM clinical records into 5 main disease categories in TCM. We explored a number of state-of-the-art deep learning models and found that the recent Bidirectional Encoder Representations from Transformers can achieve better results than other deep learning models and other state-of-the-art methods. We further utilized an unlabeled clinical corpus to fine-tune the BERT language model before training the text classifier. The method only uses Chinese characters in clinical text as input without preprocessing or feature engineering. We evaluated deep learning models and traditional text classifiers on a benchmark data set. Our method achieves a state-of-the-art accuracy 89.39% 6 0.35%, Macro F1 score 88.64% 6 0.40% and Micro F1 score 89.39% 6 0.35%. We also visualized attention weights in our method, which can reveal indicative characters in clinical text.
机译:中药(TCM)已开发数千年,对中国人民的医疗保健发挥着重要作用。本文研究了将中医临床记录分类为5个中医疾病类别的问题。我们探索了许多最先进的深度学习模型,发现近期来自变压器的双向编码器表示可以实现比其他深度学习模型和其他最先进的方法更好的结果。我们进一步利用未标记的临床语料库来微调BERT语言模型,然后培训文本分类器。该方法仅在临床文本中使用汉字作为输入而不进行预处理或特征工程。我们在基准数据集中评估了深度学习模型和传统文本分类器。我们的方法实现了最先进的精度89.39%6 0.35%,宏F1得分88.64%6 0.40%和Micro F1得分89.39%6 0.35%。我们还可以在我们的方法中显现出来的重量,这可以揭示临床文本中的指示性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号