首页> 外文学位 >Language- and domain-independent knowledge maps: A statistical phrase indexing approach.
【24h】

Language- and domain-independent knowledge maps: A statistical phrase indexing approach.

机译:语言和领域无关的知识图谱:一种统计短语索引方法。

获取原文
获取原文并翻译 | 示例

摘要

Global economy increases the need for multilingual systems, while each domain has a large repository of knowledge, particularly explicit knowledge usually captured in text. The speed of textual information being produced has exceeded the speed at which a person can process the information, so an automated approach to alleviate the information overload problem is needed. Unlike structured data in databases, unstructured text cannot be readily understood and processed by computers. This dissertation aims to create a language- and domain-independent approach to automatically generating hierarchical knowledge maps that enable the users to browse and understand the concepts hidden in the underlying knowledge sources.; A system development research methodology was adopted to build and evaluate prototype systems to study the research questions. In order to process textual knowledge, a statistical phrase indexing algorithm was proposed and applied to the Chinese language. Next, the algorithm was extended to be able to process multiple languages and domains. Lastly, the results of the algorithm was further applied to a case study using the dissertation's proposed automated framework for generating hierarchical knowledge maps in Chinese news collection.; This dissertation has two main contributions. First, it demonstrated that an automated approach is effective in creating knowledge maps for users to browse the underlying knowledge. The approach combines statistical phrase extraction algorithm for representing textual knowledge and neural networks for clustering related concepts and visualization. Second, it provided a set of language- and domain-independent tools to extract phrases from a textual knowledge in order to support text mining applications.
机译:全球经济增加了对多语言系统的需求,而每个领域都有大量的知识储备,尤其是通常以文本形式捕获的显性知识。产生文本信息的速度已经超过了人们处理信息的速度,因此需要一种自动的方法来减轻信息过载的问题。与数据库中的结构化数据不同,计算机无法轻松理解和处理非结构化文本。本文旨在创建一种独立于语言和领域的方法,以自动生成分层的知识图谱,使用户能够浏览和理解隐藏在底层知识源中的概念。采用了系统开发研究方法来构建和评估用于研究问题的原型系统。为了处理文本知识,提出了一种统计短语索引算法并将其应用于汉语。接下来,对该算法进行了扩展,使其能够处理多种语言和域。最后,通过论文提出的自动框架,将算法的结果进一步应用于案例研究,以生成中文新闻收藏中的分层知识图谱。本论文有两个主要贡献。首先,它证明了一种自动化方法对于创建供用户浏览基础知识的知识图谱是有效的。该方法结合了用于表示文本知识的统计短语提取算法和用于聚类相关概念和可视化的神经网络。其次,它提供了一套与语言和领域无关的工具,可以从文本知识中提取短语,以支持文本挖掘应用程序。

著录项

  • 作者

    Ong, Thian-Huat.;

  • 作者单位

    The University of Arizona.;

  • 授予单位 The University of Arizona.;
  • 学科 Business Administration Management.; Computer Science.
  • 学位 Ph.D.
  • 年度 2004
  • 页码 132 p.
  • 总页数 132
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 贸易经济;自动化技术、计算机技术;
  • 关键词

  • 入库时间 2022-08-17 11:43:35

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号