首页> 外文学位 >A Document Interoperation Framework on the Semantic Web (DIFSEW).
【24h】

A Document Interoperation Framework on the Semantic Web (DIFSEW).

机译:语义网上的文档互操作框架(DIFSEW)。

获取原文
获取原文并翻译 | 示例

摘要

Enormous amounts of electronic documents are generated in various domains within various contexts. Although these documents are interpretable by human readers, almost all of them lack explicit semantics which allow software applications to correctly interpret data in the documents. Therefore, it is important to create methods which allow automatically extracting information from and imparting semantics into the electronic documents. The semantics enables meaningful search, querying, transformations and interoperation of information within documents. This is especially important for large information archives which are usually the main parts of large enterprise information systems. The semantic enrichment of these information archives discussed in the present dissertation adopts Semantic Web techniques, such as ontologies, rules, and their reasoning engines, as well as Information Extraction methods, which involve position-based and ontology-based techniques. This allows re-engineering large enterprise information systems into knowledge-based systems where data from documents is automatically processed in a meaningful way.;Although the Semantic Web and Information Extraction fields are relatively well developed now, there is a need to develop an integrated framework that can embody appropriate methods to process large document storages. The goals of the present dissertation are: (1) to create the integrated semantic framework for document processing, (2) research and develop components of the framework, and (3) investigate existing and possible applications of the framework. The main objectives of the research behind the dissertation are to investigate: (1) methods which allow automatically extracting information from and imparting semantics into the electronic documents, (2) methods for preprocessing information before performing information extraction, (3) methods to process business rules with semantics for externalization of processing logic, (4) methods to work with multiple domains seamlessly, and (5) an integrated framework that can embody appropriate methods to process large document storages.;The semantic framework which integrates domain ontologies, rules, reasoning engine, Information Extraction methods, and application logic for building knowledge-based software systems is the main research outcome presented in the dissertation. The purpose of the domain ontologies is to specify conceptualization of the domain the documents belong to. The ontologies can be built manually, extracted from documents, or re-used. The purpose of rules is to specify business logic used in an enterprise. The business logic can be represented by decision table, production rules, or First-Order Logic. The purpose of Information Extraction methods integrated into the framework is to extract semantics from documents presented in various formats. The reasoning engine can be any existing engine, which can process ontologies and rules represented in an appropriate format. The application logic is responsible for querying the reasoning engine and present the result to the user.;Although the framework is integrated, its parts are externalized and independent; so the information extraction from documents, domain ontologies, document processing (business) logic, and semantic reasoning can be created and maintained separately by appropriate specialists in the field. The framework includes semantic processing of externalized data processing logic rules and to some extent externalization of application logic. The creation of external information extraction rules by the knowledge engineer is a cumbersome and time consuming task. To overcome this problem, the framework also includes a rule learning or induction system to semi-automate the generation of information extraction rules from source documents with the help of manual annotations. The present ontology and rule-based framework can be applied to: (1) re-engineering very large enterprise information systems adapting Semantic Web computing techniques and (2) creation of new knowledge-based software systems.;The dissertation is article based. It presents a variety of concepts published as individual articles to solve the problems stated above and more. Some of the concepts addressed by the dissertation are: (a) A framework for knowledge-based systems which address the concerns relevant to the problems discussed; (b) Information pre-processing using meta-ontology before performing information extraction to populate the domain ontology; (c) Identification and resolution of conflicts during ontological integration using rules for working with information from different domains; (d) RuleML-based learning object interoperability on the semantic web for representing ontologies using RuleML; (e) Representing user-friendly business rules in a semantic web-based format; (f) Information extraction from syllabi for academic e-advising; (g) Semantic annotation of semi-structured documents.;The dissertation uses all the concepts listed above and explains them as a framework consisting of modular features. More detailed information for each of the listed concepts can be found in the respective articles presented in the chapters.
机译:在各种情况下,在各个领域中都会生成大量的电子文档。尽管这些文档是人类读者可以解释的,但几乎所有文档都缺少明确的语义,这些语义使软件应用程序可以正确解释文档中的数据。因此,重要的是创建允许自动从电子文档中提取信息并将语义赋予电子文档的方法。语义使文档内信息的有意义的搜索,查询,转换和互操作成为可能。对于通常是大型企业信息系统主要部分的大型信息档案库,这尤其重要。本文讨论的这些信息档案库的语义丰富性采用了语义Web技术,例如本体,规则及其推理引擎,以及信息提取方法,涉及基于位置和基于本体的技术。这允许将大型企业信息系统重新设计为基于知识的系统,在该系统中以有意义的方式自动处理来自文档的数据。尽管语义Web和信息提取字段现在相对发达,但仍需要开发一个集成框架可以体现处理大型文档存储的适当方法。本文的目标是:(1)创建用于文档处理的集成语义框架;(2)研究和开发框架的组件;(3)研究框架的现有和可能的应用。论文的主要研究目的是研究:(1)允许从电子文档中自动提取信息并赋予其语义的方法;(2)在进行信息提取之前对信息进行预处理的方法;(3)处理业务的方法。带有语义的规则,用于处理逻辑的外部化;(4)与多个域无缝地结合使用的方法;(5)可以体现用于处理大型文档存储的适当方法的集成框架;该语义框架集成了域本体,规则,推理本文的主要研究成果是构建基于知识的软件系统的引擎,信息提取方法和应用逻辑。领域本体的目的是指定文档所属领域的概念化。本体可以手动构建,从文档中提取或重新使用。规则的目的是指定企业中使用的业务逻辑。业务逻辑可以由决策表,生产规则或一阶逻辑表示。集成到框架中的信息提取方法的目的是从以各种格式呈现的文档中提取语义。推理引擎可以是任何现有引擎,可以处理以适当格式表示的本体和规则。应用程序逻辑负责查询推理引擎并将结果呈现给用户。尽管该框架是集成的,但其各部分却是外部化的和独立的。因此,可以由该领域的相应专家分别创建和维护从文档,域本体,文档处理(业务)逻辑和语义推理中提取的信息。该框架包括对外部化数据处理逻辑规则的语义处理,以及在某种程度上对应用程序逻辑的外部化。知识工程师创建外部信息提取规则是一项繁琐且耗时的任务。克服这个问题,该框架还包括一个规则学习或归纳系统,可在手动注释的帮助下半自动从源文档生成信息提取规则。该本体和基于规则的框架可以应用于:(1)重构大型企业信息系统以适应语义Web计算技术;(2)创建新的基于知识的软件系统。它提出了各种概念作为单独的文章发布,以解决上述问题以及更多问题。论文涉及的一些概念是:(a)一个基于知识的系统的框架,该框架解决与所讨论的问题有关的关切; (b)在进行信息提取以填充领域本体之前,使用元本体进行信息预处理; (c)使用与来自不同领域的信息一起使用的规则来识别和解决本体整合过程中的冲突; (d)语义网上基于RuleML的学习对象互操作性,以使用RuleML表示本体; (e)以基于语义网的格式表示用户友好的业务规则; (f)从教学大纲中提取信息以进行学术电子咨询; (g)半结构化文档的语义注释。本论文使用上面列出的所有概念,并将它们解释为由模块化功能组成的框架。可在各章中提供的相应文章中找到有关每个列出的概念的详细信息。

著录项

  • 作者

    Ranganathan, Girish R.;

  • 作者单位

    University of New Brunswick (Canada).;

  • 授予单位 University of New Brunswick (Canada).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 305 p.
  • 总页数 305
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号