Automatic clustering of construction project documents based on textual similarity

Mohammed Al Qady; Amr Kandil

首页> 外文期刊>Automation in construction >Automatic clustering of construction project documents based on textual similarity

【24h】

Automatic clustering of construction project documents based on textual similarity

机译：基于文本相似度的建筑项目文档自动聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classifiers, as supervised learning methods, require a comprehensive training set that covers all classes in order to classify new instances. This limits the use of text classifiers for organizing construction project documents since it is not guaranteed that sufficient samples are available for all possible document categories. To overcome the restriction imposed by the all-inclusive requirement, an unsupervised learning method was used to automatically cluster documents together based on textual similarities. Repeated evaluations using different randomizations of the dataset revealed a region of threshold/dimensionality values of consistently high precision values and average recall values. Accordingly, a hybrid approach was proposed which initially uses an unsupervised method to develop core clusters and then trains a text classifier on the core clusters to classify outlier documents in a consequent refinement step. Evaluation of the hybrid approach demonstrated a significant improvement in recall values, resulting in an overall increase in F-measure scores.

机译：文本分类器作为一种受监督的学习方法，需要涵盖所有类别的全面培训集，以便对新实例进行分类。由于不能保证为所有可能的文档类别提供足够的样本，因此这限制了使用文本分类器来组织建筑项目文档。为了克服全包要求带来的限制，一种无监督的学习方法用于基于文本相似性将文档自动聚类在一起。使用数据集的不同随机性进行的重复评估显示出始终如一的高精度值和平均召回值的阈值/维值区域。因此，提出了一种混合方法，该方法首先使用无监督方法来开发核心簇，然后在随后的细化步骤中在核心簇上训练文本分类器以对异常文档进行分类。对混合方法的评估表明，召回值有了显着改善，从而导致F值得分总体提高。

著录项

来源
《Automation in construction》 |2014年第6期|36-49|共14页
作者
Mohammed Al Qady; Amr Kandil;
展开▼
作者单位

School of Civil Engineering, Purdue University, West Lafayette, IN 47907-2051, United States,2775 Windwood Dr. #178 Ann Arbor, MI 48105;

School of Civil Engineering, Purdue University, West Lafayette, IN 47907-2051, United States;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Document management; Single pass clustering; Supervised/unsupervised learning methods;

机译：文档管理;单遍聚类;有监督/无监督学习方法;

相似文献

外文文献
中文文献
专利

1. Automatic textual document categorization based on generalized instance sets and a metamodel [J] . Wai Lam, Yiqiu Han IEEE Transactions on Pattern Analysis and Machine Intelligence . 2003,第5期

机译：基于广义实例集和元模型的文本文档自动分类
2. TEXTUAL-BASED CLUSTERING OF WEB DOCUMENTS [J] . PAWEL BRZEMINSKI, WTTOLD PEDRYCZ International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2004,第6期

机译：基于文本的Web文档聚类
3. Spatio-textual user matching and clustering based on set similarity joins [J] . Belesiotis Alexandros, Skoutas Dimitrios, Efstathiades Christodoulos, The VLDB journal . 2018,第3期

机译：基于集合相似性联接的时空文本用户匹配和聚类
4. Automatic Textual Document Categorization Using Multiple Similarity-Based Models [C] . Kwok-Yin Lai, Wai Lam SIAM International Conference on Data Mining . 2001

机译：使用基于多个相似性模型的自动文本文档分类
5. CPDICenter: Web-based virtual construction project document information center in support of claims preparation. [D] . Hammad, Mamoon Mohammad. 2001

机译：CPDICenter：基于Web的虚拟施工项目文件信息中心，可支持索赔准备。
6. KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters [O] . Akihiro Nakaya, Toshiaki Katayama, Masumi Itoh, 2013

机译：KEGG OC：基于分类法的直系同源簇的大规模自动构建
7. Computer-based Textual Documents Collation System for Reconstructing the Original Text from Automatically Identified Base Text and Ranked Witnesses [O] . Nassourou Mohamadou 2011

机译：基于计算机的文本文档整理系统，用于从自动识别的基础文本和等级证人那里重建原始文本

Automatic clustering of construction project documents based on textual similarity

摘要

著录项

相似文献

相关主题

期刊订阅