首页> 外文学位 >Knowledge-based methods for automatic extraction of domain-specific ontologies.
【24h】

Knowledge-based methods for automatic extraction of domain-specific ontologies.

机译:用于自动提取特定领域本体的基于知识的方法。

获取原文
获取原文并翻译 | 示例

摘要

Semantic web technology aims at developing methodologies for representing large amount of knowledge in web accessible form. The semantics of knowledge should be easy to interpret and understand by computer programs, so that sharing and utilizing knowledge across the Web would be possible. Domain specific ontologies form the basis for knowledge representation in the semantic web. Research on automated development of ontologies from texts has become increasingly important because manual construction of ontologies is labor intensive and costly, and, at the same time, large amount of texts for individual domains is already available in electronic form. However, automatic extraction of domain specific ontologies is challenging due to the unstructured nature of texts and inherent semantic ambiguities in natural language. Moreover, the large size of texts to be processed renders full-fledged natural language processing methods infeasible.; In this dissertation, we develop a set of knowledge-based techniques for automatic extraction of ontological components (concepts, taxonomic and non-taxonomic relations) from domain texts. The proposed methods combine information retrieval metrics, lexical knowledge-base (like WordNet), machine learning techniques, heuristics, and statistical approaches to meet the challenge of the task. These methods are domain-independent and automatic approaches.; For extraction of concepts, the proposed WNSCA+{lcub}PE, POP{rcub} method utilizes the lexical knowledge base WordNet to improve precision and recall over the traditional information retrieval metrics. A WordNet-based approach, the compound term heuristic, and a supervised learning approach are developed for taxonomy extraction. We also developed a weighted word-sense disambiguation method for use with the WordNet-based approach. An unsupervised approach using log-likelihood ratios is proposed for extracting non-taxonomic relations. Further more, a supervised approach is investigated to learn the semantic constraints for identifying relations from prepositional phrases. The proposed methods are validated by experiments with the Electronic Voting and the Tender Offers, Mergers, and Acquisitions domain corpus. Experimental results and comparisons with some existing approaches clearly indicate the superiority of our methods.; In summary, a good combination of information retrieval, lexical knowledge base, statistics and machine learning methods in this study has led to the techniques efficient and effective for extracting ontological components automatically.
机译:语义网络技术旨在开发以网络可访问形式表示大量知识的方法。知识的语义应该易于由计算机程序解释和理解,以便可以在Web上共享和利用知识。特定领域本体构成了语义网中知识表示的基础。从文本自动开发本体的研究变得越来越重要,因为手动构建本体是劳动密集型的且成本高昂的,同时,用于各个领域的大量文本已经可以电子形式获得。然而,由于文本的非结构化性质和自然语言中固有的语义歧义,自动提取特定领域本体是一个挑战。而且,要处理的文本的大尺寸使成熟的自然语言处理方法不可行。在本文中,我们开发了一套基于知识的技术,用于从领域文本中自动提取本体的组成部分(概念,分类学和非分类学关系)。所提出的方法结合了信息检索指标,词汇知识库(如WordNet),机器学习技术,启发式方法和统计方法来应对任务的挑战。这些方法是领域无关的自动方法。为了提取概念,建议的WNSCA + {lcub} PE,POP {rcub}方法利用词汇知识库WordNet来提高精度和对传统信息检索指标的回忆。为分类法提取开发了基于WordNet的方法,复合术语启发式方法和监督学习方法。我们还开发了一种加权的词义消歧方法,用于基于WordNet的方法。提出了一种使用对数似然比的无监督方法来提取非分类关系。此外,研究了一种监督方法来学习语义约束,以从介词短语中识别关系。通过与电子投票和投标要约,合并和收购领域语料库的实验验证了所提出的方法。实验结果和与某些现有方法的比较清楚地表明了我们方法的优越性。综上所述,本研究中信息检索,词汇知识库,统计数据和机器学习方法的良好结合,导致了有效地自动提取本体成分的技术。

著录项

  • 作者

    Punuru, Janardhana R.;

  • 作者单位

    Louisiana State University and Agricultural & Mechanical College.;

  • 授予单位 Louisiana State University and Agricultural & Mechanical College.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2007
  • 页码 106 p.
  • 总页数 106
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:40:31

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号