首页> 外文期刊>Automation in construction >Automatic clustering of construction project documents based on textual similarity
【24h】

Automatic clustering of construction project documents based on textual similarity

机译:基于文本相似度的建筑项目文档自动聚类

获取原文
获取原文并翻译 | 示例
           

摘要

Text classifiers, as supervised learning methods, require a comprehensive training set that covers all classes in order to classify new instances. This limits the use of text classifiers for organizing construction project documents since it is not guaranteed that sufficient samples are available for all possible document categories. To overcome the restriction imposed by the all-inclusive requirement, an unsupervised learning method was used to automatically cluster documents together based on textual similarities. Repeated evaluations using different randomizations of the dataset revealed a region of threshold/dimensionality values of consistently high precision values and average recall values. Accordingly, a hybrid approach was proposed which initially uses an unsupervised method to develop core clusters and then trains a text classifier on the core clusters to classify outlier documents in a consequent refinement step. Evaluation of the hybrid approach demonstrated a significant improvement in recall values, resulting in an overall increase in F-measure scores.
机译:文本分类器作为一种受监督的学习方法,需要涵盖所有类别的全面培训集,以便对新实例进行分类。由于不能保证为所有可能的文档类别提供足够的样本,因此这限制了使用文本分类器来组织建筑项目文档。为了克服全包要求带来的限制,一种无监督的学习方法用于基于文本相似性将文档自动聚类在一起。使用数据集的不同随机性进行的重复评估显示出始终如一的高精度值和平均召回值的阈值/维值区域。因此,提出了一种混合方法,该方法首先使用无监督方法来开发核心簇,然后在随后的细化步骤中在核心​​簇上训练文本分类器以对异常文档进行分类。对混合方法的评估表明,召回值有了显着改善,从而导致F值得分总体提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号