首页> 外文会议>Annual conference of the Canadian Society for Civil Engineering 2011 >Automatic Document Classification Using a Successively Evolving Dataset
【24h】

Automatic Document Classification Using a Successively Evolving Dataset

机译:使用逐步发展的数据集进行自动文档分类

获取原文

摘要

Large amounts of project documents are produced in a construction project. Unlike contract documents which are produced at the same time to finalize the contract, project documents are produced gradually throughout the life of the project. Accordingly, the processing of project documents is a continuous and gradual task performed almost on a daily basis. The majority of project documents are text documents containing unstructured information which from a document management perspective produces several problems such as increasing the difficulty level of information retrieval, creating interoperability issues between different systems and hindering information reuse. Another important aspect that characterizes construction project documents is that the documents are semantically interrelated. Project information on a certain event is recorded, disputed, revised and reiterated in various successive documents producing links between these documents. Ultimately, a knowledge discourse is generated that can only be represented by the aggregation of the information in the relevant documents, not just by the information contained in one document; a discourse that requires the application of cognitive skills by the information seeker to comprehensively deduce. The performance of an automatic text classifier based on latent semantic analysis (LSA) in identifying such relations between project documents is investigated. The results of the evaluation may offer important applications in electronic document management, information retrieval and, in general, knowledge sharing and reuse.
机译:一个建筑项目中会产生大量的项目文件。与在同一时间生成最终合同的合同文件不同,项目文件是在项目的整个生命周期中逐步生成的。因此,项目文件的处理是几乎每天执行的连续且渐进的任务。大多数项目文档是包含非结构化信息的文本文档,从文档管理的角度来看,这些文档会产生一些问题,例如增加信息检索的难度级别,在不同系统之间创建互操作性问题以及阻碍信息重用。表征建设项目文档的另一个重要方面是文档在语义上是相互关联的。在某个连续事件的文件中记录,争议,修订和重申有关某个事件的项目信息,从而在这些文件之间建立链接。最终,产生了一种知识话语,它只能由相关文档中信息的聚集来表示,而不仅仅是由一个文档中包含的信息来代表;一种需要信息搜寻者运用认知技能来进行全面演绎的话语。研究了基于潜在语义分析(LSA)的自动文本分类器在识别项目文档之间的这种关系时的性能。评估结果可能会在电子文档管理,信息检索以及通常的知识共享和重用中提供重要的应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号