首页> 中文期刊> 《中国生物医学工程学报》 >中文病历文本中时间信息自动标注




病历文本中标准化的时间信息及其关联临床事件的自动提取,对促进临床决策支持和医疗信息挖掘等应用具有重要意义.虽然很多研究已提出多种面向临床事件的提取方法,然而在时间信息自动标注领域的研究还未达到实际利用的水平.这主要是由于中文病历文本中时间信息表达的多样性、相互关联性和不明确性所致.为此本研究首先利用基于正则表达式的时间信息自动识别方法实现基本时间信息的提取,然后通过分析和研究中文病历文本中参考时间的种类和选取规律,将时间信息自动识别结果进行计算并自动标注为国际时间标注标准TIMEX2形式.本方法在包含1 207条时间信息的147份实际病历文本语料中进行了验证,结果显示时间识别的F值为92.82%,时间标注的F值为90.80%,为时间信息的后续利用奠定了良好的基础.%Automatic extraction of standardized temporal information and associated clinical events from clinical narrative text is critical to promote the utilization of clinical decision support and medical information mining in current clinical environment in which large amount of narrative records exist. However,despite many successful terminology-based clinical events extraction methods has been proposed,the temporal information automatic extraction study was disappointing. A major impediment is the diversity,inter-reliance and uncertainty of temporal information expression in clinical narrative record. In this study,a regular-expression based temporal information identification algorithm was developed to achieve primary temporal information identification; next the major reference time categories in Chinese clinical document was investigated and a reference time selection rule was proposed; after that identified temporal information was calculated and annotated in the TIMEX2 international standard format. Corpuses which contain 147 practical medical records and include 1207 items of temporal information were used to validate the proposed methods. The test results showed that the F-measure is 92. 82% for temporal information identification,and 90. 80% for temporal annotation,laying the groundwork for succeeding utilization of temporal information.



  • 中文文献
  • 外文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号