Large amounts of project documents are produced in a construction project. Unlike contract documents which are produced at the same time to finalize the contract, project documents are produced gradually throughout the life of the project. Accordingly, the processing of project documents is a continuous and gradual task performed almost on a daily basis. The majority of project documents are text documents containing unstructured information which from a document management perspective produces several problems such as increasing the difficulty level of information retrieval, creating interoperability issues between different systems and hindering information reuse. Another important aspect that characterizes construction project documents is that the documents are semantically interrelated. Project information on a certain event is recorded, disputed, revised and reiterated in various successive documents producing links between these documents. Ultimately, a knowledge discourse is generated that can only be represented by the aggregation of the information in the relevant documents, not just by the information contained in one document; a discourse that requires the application of cognitive skills by the information seeker to comprehensively deduce. The performance of an automatic text classifier based on latent semantic analysis (LSA) in identifying such relations between project documents is investigated. The results of the evaluation may offer important applications in electronic document management, information retrieval and, in general, knowledge sharing and reuse.
展开▼