...
首页> 外文期刊>Artificial intelligence for engineering design, analysis and manufacturing >Discourse analysis based segregation of relevant document segments for knowledge acquisition
【24h】

Discourse analysis based segregation of relevant document segments for knowledge acquisition

机译:基于话语分析的相关文档段分离,用于知识获取

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Documents are a useful source of expert knowledge in organizations and can be used to foresee, in an earlier stage of a product's life cycle, potential issues and solutions that might occur in later stages of its life cycle. In this research, these stages are, respectively, design and assembly. Even if these documents are available online, it is rather difficult for users to access the knowledge contained in these documents. It is therefore desirable to automatically extract the knowledge contained in these documents and store them in a computer accessible or manipulable form. This paper describes an approach for the first step in this acquisition process: automatically identifying segments of documents that are relevant to aircraft assembly, so that they can be further processed for acquiring expert knowledge. Such identification of relevant segments is necessary for avoiding processing of unrelated information that is costly and possibly distracting for domain relevance. The approach to extracting relevant segments has two steps. The first step is the identification of sentences that form a coherent segment of text, within which the topic does not shift. The second step is to classify segments that are within the topics of interest for knowledge acquisition, that is, aircraft assembly in this instance. These steps filter out segments that are unrelated, and therefore need not be processed for subsequent knowledge acquisition. The steps are implemented by understanding the contents of documents. Using methods of discourse analysis, in particular, discourse representation theory, a list of discourse entities is obtained. The difference in discourse entities between sentences is used to distinguish between segments. The list of discourse entities in a segment is compared against a domain ontology for classification. The implementation and results of validation on sample texts for these steps are described.
机译:文档是组织中专家知识的有用来源,可用于预见产品生命周期的早期阶段中,在生命周期后期可能出现的潜在问题和解决方案。在这项研究中,这些阶段分别是设计和组装。即使可以在线获取这些文档,用户也很难访问这些文档中包含的知识。因此,期望自动提取这些文档中包含的知识,并将其以计算机可访问或可操纵的形式存储。本文介绍了此获取过程的第一步的方法:自动识别与飞机组装相关的文档片段,以便可以对其进行进一步处理以获取专家知识。对相关段的这种标识对于避免处理不相关的信息是必要的,因为不相关的信息是昂贵的并且可能分散域相关性。提取相关段的方法有两个步骤。第一步是识别形成连贯的文本片段的句子,在该句子中话题不会转移。第二步是对知识获取感兴趣的主题内的段进行分类,在这种情况下,即飞机组装。这些步骤过滤掉不相关的段,因此无需进行后续知识获取的处理。通过理解文档的内容来实施这些步骤。使用话语分析的方法,尤其是话语表示理论,可以获得话语实体的列表。句子之间的语篇实体差异用于区分句段。将段中的话语实体列表与领域本体进行比较以进行分类。描述了这些步骤的示例文本的实现方式和验证结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号