首页> 外文会议>International Conference on Sustainable Design, Engineering and Construction >Ontology-based sequence labelling for automated information extraction for supporting bridge data analytics
【24h】

Ontology-based sequence labelling for automated information extraction for supporting bridge data analytics

机译:基于本体的序列标记,用于支持桥梁数据分析的自动信息提取

获取原文

摘要

The massive amount of data/information buried in textual bridge inspection reports open opportunities to leverage big data analytics for advanced information-rich bridge deterioration prediction. However, utilizing textual data for bridge deterioration prediction is challenging because of its inherently unstructured nature. To this end, this paper proposes an ontology-based information extraction (IE) framework that automatically recognizes and extracts key data/information from unstructured textual reports, and represents the extracted data/information in a structured way that is ready for data analytics. The proposed IE framework is composed of two primary components: (1) ontology-based sequence labelling for term identification, and (2) ontology-based dependency grammar for relationship association. This paper focuses on presenting the proposed sequence labelling methodology. The methodology utilizes ontology-based begin, inside, and outside (BIO) encoding for phrase-level segmentation and Conditional Random Field (CRF) for ontology-based labelling in both token and phrase levels. The experimental results showed that the proposed methodology has a precision of 97% and a recall of 91 %.
机译:数据/信息埋在文本桥梁检测的巨量报告公开的机会,以先进的信息丰富的桥恶化预测利用大数据分析。然而,对于桥梁恶化预测利用文本数据,因为其固有的非结构化性质的挑战。为此,提出了一种基于本体的信息提取(IE)的框架,可以自动识别和从非结构化文本报告中提取密钥数据/信息,和表示在该准备用于数据分析以结构化方式所提取的数据/信息。所提出的IE框架由两个主要部分组成:(1)项识别基于本体的序列标签,和(2)本体的基于依赖性的语法为关系关联。本文着重介绍拟议序列标注方法。该方法利用基于本体开始,里面,和外侧(BIO)编码词组级分割和条件随机场(CRF),用于在两个令牌和短语级别基于本体的标记。实验结果表明,该方法具有97%的精确度和91%的召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号