首页> 中文期刊>计算机与现代化 >病理镜检文本数据的结构化处理方法

病理镜检文本数据的结构化处理方法

     

摘要

Abstrca t:The current structured approaches for the medical text data are mostly dependent on universal word segmentation soft-ware or professional terminology libraries, but the recognition effect of professional vocabularies by universal word segmentation tools is not satisfactory, and a mature system of Chinese standard terminology library is not established.Aimed at these problems, this paper puts forward a kind of structured processing method for medical text data based on statistical information.On the basis of clustering text and according to the breakpoint words and coincident string word segmentation, the key words and the type infor-mation of words are obtained by the statistical information of participle word string, enlarged the words and got the final lexicon as the word dictionary.It carried out word segmentation by the two-way dictionary word maximum matching algorithm and then ob-tained structured data by adding the rules of negative detection.Experiments show that the accuracy of the professional vocabulary libraries obtained by this method reached 80%, and this method achieves the capability to get structured data without the help of segmentation tools.%目前医疗文本数据的结构化处理大多依赖通用分词工具或医学知识库,而通用分词工具对专业术语的识别效果并不理想,且国内的中文医学术语标准化进程不足。针对此问题,提出一种基于统计信息对镜检文本数据进行结构化处理的方法。该方法以聚类文本为基础,基于断点词与重合串分词,利用分词词串的统计信息获取关键词以及词语类别信息,并进行词语扩充,从而得到最终词库作为字典。利用基于字典的双向最大匹配分词算法,对文本数据进行分词,并通过添加否定检出的规则,获取结构化数据。实验结果表明,该方法获取的医学词库的准确率达到了80%,实现了不依赖分词工具获得结构化数据的功能。

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号