首页> 外文期刊>Journal of biomedical informatics. >Common data model for natural language processing based on two existing standard information models: CDA+GrAF
【24h】

Common data model for natural language processing based on two existing standard information models: CDA+GrAF

机译:基于两个现有标准信息模型的自然语言处理通用数据模型:CDA + GrAF

获取原文
获取原文并翻译 | 示例
           

摘要

An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text.We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled " CDA+GrAF" We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks.This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and " translating" annotations between different NLP applications, and eventually " plug-and-play" of different modules in NLP applications.
机译:在自然语言处理(NLP)研究和开发社区中,对协作和资源共享的需求日益增长,这促使人们努力创建和共享用于注释和提取自临床文本的所有信息的通用数据模型和通用术语。我们结合了两个现有标准:HL7临床文档架构(CDA)和ISO图形注释格式(GrAF;正在开发中),以开发名为“ CDA + GrAF”的数据模型。我们尝试了几种方法来组合这些现有标准,并最终选择了方法,将单独的CDA和GrAF零件包装在一个通用的分隔注释(即与注释文本分开)的XML文档中。使用了两个用例(临床文档部分)和2010 i2b2 / VA NLP挑战(即问题,测试和治疗以及它们的断言和关系)来创建此类对峙注释文档的示例,并已通过两种标准都提供了XML模式。我们开发了一种工具,可自动将注释文档从2010 i2b2 / VA NLP Challenge格式转换为GrAF,并使用此工具自动生成了50个注释文档,所有文档均已成功验证。最后,我们调整了HL7 CDA随附的XSL样式表,以允许在Web浏览器中查看注释XML文档,并计划调整现有工具以在CDA + GrAF与UIMA和GATE框架之间转换注释文档。比较NLP工具和应用程序,在不同的NLP应用程序之间组合它们的输出,转换和“翻译”注释,并最终在NLP应用程序中“即插即用”不同模块。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号