...
首页> 外文期刊>Journal of Construction Engineering and Management >Comparing Natural Language Processing Methods to Cluster Construction Schedules
【24h】

Comparing Natural Language Processing Methods to Cluster Construction Schedules

机译:将自然语言处理方法与集群施工时间表进行比较

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The names of construction activities are the only unstructured data attribute in construction schedules, and they often guide construction execution. Activity names are devised to communicate between stakeholders, and therefore often are written using inconsistent terminologies across repetitive activities with omitted contextual information. This presents a challenge for machine learning systems when learning patterns from construction schedules. This paper compared the performance of state-of-the-art text-related clustering methods in identifying repetitive activities. This was achieved by creating a ground truth data set on the basis of the standard construction work classification, and then comparing the precision, recall, and F1 score of latent semantic analysis (LSA), latent Dirichlet allocation (LDA), word2vec, and fastText algorithms to group activity names in 27 construction schedules. Results indicated that the F1 score of LSA outperformed LDA (0.84% versus 0.88%), whereas the results of language models-based clustering depended on the quality of word embedding and the paired clustering method. This study provides insight into how to preprocess activity names of construction schedules for further artificial intelligence (AI)-based quantitative analysis. Methodologies described in this study will help researchers who work on natural language-related research in construction (e.g., safety and contract management) to better capture the feature of words, rather than only counting the word frequencies. (C) 2021 American Society of Civil Engineers.
机译:施工活动的名称是施工时间表中唯一的非结构化数据属性,他们经常指导施工执行。活动名称被设计为在利益相关者之间进行通信,因此通常使用不一致的术语在重复活动中使用不一致的术语,省略上下文信息。这对从施工时间表学习模式时,对机器学习系统提出了挑战。本文与最先进的文本相关聚类方法进行了识别重复活动的性能。这是通过在标准施工分类的基础上创建地面真理数据,然后比较潜在语义分析(LSA),潜在的Dirichlet分配(LDA),Word2Vec和FastText的精度,召回和F1分数来实现在27个施工时间表中对组活动名称进行分组的算法。结果表明,LSA的F1得分优于LDA(0.84%而与0.88%),而基于语言模型的群集结果取决于单词嵌入和配对群集方法的质量。本研究提供了介绍如何预处理施工时间表的活动名称,以获得进一步的人工智能(AI)的定量分析。本研究中描述的方法将有助于研究人员在建筑(例如,安全和合同管理)中进行自然语言相关的研究,以更好地捕获单词的特征,而不是仅计算单词频率。 (c)2021年美国土木工程师协会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号