首页> 外文会议>Integrated Communications, Navigation and Surveillance Conference >DEEP LEARNING FOR EXTRACTING WORD-LEVEL MEANING FROM SAFETY REPORT NARRATIVES
【24h】

DEEP LEARNING FOR EXTRACTING WORD-LEVEL MEANING FROM SAFETY REPORT NARRATIVES

机译:从安全报告叙述中提取单词级别的深度学习

获取原文

摘要

Much of aviation safety reporting data consists of structured data e.g., digital flight data or radar data. However, safety report narratives, which come in the form of unstructured text data, are indispensable for safety reporting. Structured data alone is inadequate to capture all of the details of an incident while narratives can and do represent a myriad of details in a form that is natural for analysts to work with. However, large-scale analysis of narratives comes with many challenges: 1) it is difficult to employ enough human experts to digest the continuous flow of new incident reports 2) authors of incident reports use many different terms to refer to the same semantic concept, which makes it more difficult to determine if a specific concept occurs in texts 3) authors often make spelling mistakes and 4) authors use a wide variety of abbreviations for terms, some of which are nonstandard. These challenges can be mitigated by the intelligent use of Natural Language Processing (NLP) and Deep Learning techniques to automate parts of narrative processing. Specifically, we show how to use ensembles of word2vec models to automatically find semantically similar terms within safety report corpora and how to use a combination of human expertise and these ensemble models to identify sets of similar terms with greater recall then either method alone. We also show an unsupervised method for comparing several word2vec models trained on the same data in order to estimate reasonable ranges of vector sizes to induce individual word2vec models. This method is based on measuring inter-model agreement on common word2vec similar terms.
机译:大部分航空安全报告数据包括例如结构化数据。数字飞行数据或雷达数据。但是,以非结构化文本数据的形式出现的安全报告叙述对于安全报告是必不可少的。仅限于结构化数据不充分,以捕获事件的所有细节,而叙述可以以自然的形式代表自然的形式的无数细节,以便与分析师合作。然而,对叙事的大规模分析具有许多挑战:1)难以使用足够的人类专家来消化新事件报告的连续流动2)事件报告的作者使用许多不同的术语来提及相同的语义概念,这使得更难以确定在文本中发生特定概念3)作者通常会使拼写错误和4)作者使用各种各样的缩写,其中一些是非标准的。这些挑战可以通过自然语言处理(NLP)和深度学习技术来实现这些挑战,以自动化部分叙述处理。具体来说,我们展示了如何使用Word2Vec模型的集合来自动在安全报告语料库中自动查找语义类似的术语以及如何使用人类专业知识和这些集合模型的组合来识别具有更高召回的类似术语的组,然后单独使用更大的召回。我们还显示了一种无监督的方法,用于比较在相同数据上训练的几个Word2VEC模型,以便估计与诱导单个Word2VEC模型的合理范围。此方法是基于测量常见Word2VEC类似术语的模型互联协议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号