...
首页> 外文期刊>PLoS One >Building a semantically annotated corpus for chronic disease complications using two document types
【24h】

Building a semantically annotated corpus for chronic disease complications using two document types

机译:使用两种文档类型构建用于慢性疾病并发症的语义注释的语料

获取原文
           

摘要

Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.
机译:电子健康记录(EHRS)中的叙述信息包含与患者健康状况相关的丰富信息。此外,人们使用Twitter表示他们对个人健康问题的经验,例如医疗投诉,症状,治疗,生活方式和其他因素。这两种文本包括不同类型的疾病并发症和危险因素的健康相关信息。了解有关控制疾病风险因素的详细信息对改变这些风险并随后预防疾病并发症具有很大影响。文本挖掘工具提供有效的解决方案,以提取和整合与隐藏在大量叙述文本中的疾病并发症相关的重要信息。但是,文本挖掘工具的开发取决于注释语料库的可用性。在回应中,我们开发了预防性的语料库,其在高血压和糖尿病之间的相互作用的背景下,与疾病并发症的鉴定,潜在的危险因素和预防措施相关的信息。语料库是唯一的,并且在生物医学领域的非常具体的主题方面是新颖的,并且作为从Twitter收集的EHR和推文的信息集成。注释方案旨在通过域专家的指导设计,另外两个领域专家进行了注释,导致高质量的注释,同意率为高达0.60和0.75的EHRS和推文。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号