首页> 外文期刊>Journal of Biomedical Semantics >An automatic approach for constructing a knowledge base of symptoms in Chinese
【24h】

An automatic approach for constructing a knowledge base of symptoms in Chinese

机译:一种自动构建中文症状知识库的方法

获取原文
       

摘要

BackgroundWhile a large number of well-known knowledge bases (KBs) in life science have been published as Linked Open Data, there are few KBs in Chinese. However, KBs in Chinese are necessary when we want to automatically process and analyze electronic medical records (EMRs) in Chinese. Of all, the symptom KB in Chinese is the most seriously in need, since symptoms are the starting point of clinical diagnosis. ResultsWe publish a public KB of symptoms in Chinese, including symptoms, departments, diseases, medicines, and examinations as well as relations between symptoms and the above related entities. To the best of our knowledge, there is no such KB focusing on symptoms in Chinese, and the KB is an important supplement to existing medical resources. Our KB is constructed by fusing data automatically extracted from eight mainstream healthcare websites, three Chinese encyclopedia sites, and symptoms extracted from a larger number of EMRs as supplements. MethodsFirstly, we design data schema manually by reference to the Unified Medical Language System (UMLS). Secondly, we extract entities from eight mainstream healthcare websites, which are fed as seeds to train a multi-class classifier and classify entities from encyclopedia sites and train a Conditional Random Field (CRF) model to extract symptoms from EMRs. Thirdly, we fuse data to solve the large-scale duplication between different data sources according to entity type alignment, entity mapping, and attribute mapping. Finally, we link our KB to UMLS to investigate similarities and differences between symptoms in Chinese and English. ConclusionsAs a result, the KB has more than 26,000 distinct symptoms in Chinese including 3968 symptoms in traditional Chinese medicine and 1029 synonym pairs for symptoms. The KB also includes concepts such as diseases and medicines as well as relations between symptoms and the above related entities. We also link our KB to the Unified Medical Language System and analyze the differences between symptoms in the two KBs. We released the KB as Linked Open Data and a demo at https://datahub.io/dataset/symptoms-in-chinese .
机译:背景技术虽然许多生命科学领域的知名知识库(KB)已作为链接开放数据发布,但中文却很少。但是,当我们要自动处理和分析中文电子病历(EMR)时,必须使用中文KB。最重要的是,中文症状KB是最需要的,因为症状是临床诊断的起点。结果我们发布了中文的症状公共知识库,包括症状,科室,疾病,药物和检查以及症状与上述相关实体之间的关系。据我们所知,还没有针对中文症状的知识库,知识库是对现有医学资源的重要补充。我们的知识库是通过融合从8个主流医疗保健网站,3个中国百科全书网站中自动提取的数据以及从大量EMR中提取的症状作为补充而构建的。方法首先,我们参考统一医学语言系统(UMLS)手动设计数据模式。其次,我们从八个主流医疗保健网站中提取实体,这些实体作为种子来训练多分类器,并从百科全书站点中对实体进行分类,并训练条件随机场(CRF)模型从EMR中提取症状。第三,我们根据实体类型对齐,实体映射和属性映射融合数据以解决不同数据源之间的大规模重复。最后,我们将知识库链接到UMLS,以研究中英文症状之间的异同。结论因此,知识库在中文中有26,000多种不同的症状,其中包括3968种中医症状和1029个症状同义词。知识库还包括疾病和药物等概念以及症状与上述相关实体之间的关系。我们还将知识库链接到统一医学语言系统,并分析两个知识库中症状之间的差异。我们在https://datahub.io/dataset/symptoms-in-chinese中发布了KB作为链接的开放数据和一个演示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号