首页> 外文会议>ASCE international workshop on computing in civil engineering >Gazetteers for Information Extraction Applications in Construction Safety Management
【24h】

Gazetteers for Information Extraction Applications in Construction Safety Management

机译:用于信息提取应用的公鸡施工安全管理

获取原文

摘要

Gazetteers, also known as entity dictionaries, can be applied to support many information extraction (IE) applications such as named entity recognition (NER). However, gazetteers are not always available, because they require not only domain knowledge but also human effort during development. Existing gazetteers are also mostly general in nature; they are limited to providing common types of entities such as locations, organizations, person names, etc. These common types of entities cannot accurately reflect the semantics of specific domain knowledge, and their applicability is therefore limited. A useful gazetteer, on the other hand, must be able to indicate the important types of entities (i.e., classes of concepts) and must also contain a sufficient number of entities. This creates the need for domain-specific gazetteers, especially when involving specialized IE tasks. In this paper, the authors take construction safety management as the target domain and propose a semi-automated approach to develop a construction safety gazetteer, aiming to eventually support IE applications for constructions safety management. The proposed approach consists of three steps: (1) applying natural language processing (NLP) techniques to extract important phrases (not limited to single terms, but including bigrams and trigrams) from text resources, (2) defining important types of entities as entity classes for the construction safety domain, and (3) assigning the extracted phrases to the predefined entity classes. The authors also discuss how the proposed methodology could be affordable for domain experts, and the possible scenarios for IE applications for supporting construction safety management.
机译:销鸟类,也称为实体词典,可以应用于支持许多信息提取(IE)应用程序,例如命名实体识别(ner)。然而,公鸡并不总是可用的,因为它们不仅需要域名知识,而且需要在开发期间的人力努力。现有的公鸡也主要是普遍的;它们仅限于提供常见类型的实体,例如地点,组织,人称。这些常见类型的实体不能准确反映特定领域知识的语义,因此它们的适用性受到限制。另一方面,一位有用的宪录必须能够指示重要类型的实体(即,概念类),并且还必须包含足够数量的实体。这会创造对特定于域的公鸡,特别是在涉及专门的IE任务时。在本文中,作者将建筑安全管理作为目标领域,提出了一种半自动方法,开发建筑安全宪报知识产权,旨在最终支持建筑安全管理的应用。所提出的方法包括三个步骤:(1)应用自然语言处理(NLP)技术来提取重要的短语(不限于单一术语,而是包括Bigrams和Trigrams),(2)将重要类型的实体定义为实体施工安全域的类,以及(3)将提取的短语分配给预定义实体类。作者还讨论了所提出的方法如何为域专家负担得起,以及IE应用支持建筑安全管理的应用的可能场景。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号