首页> 外文学位 >Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining.
【24h】

Text association mining with cross-sentence inference, structure-based document model and multi-relational text mining.

机译:带有跨句推理的文本关联挖掘,基于结构的文档模型和多关系文本挖掘。

获取原文
获取原文并翻译 | 示例

摘要

With an exponential growth of published documents, text mining becomes a vital tool for an automated extraction of information and discovery of hidden information/knowledge. We begin this dissertation with an overview of text mining covering key definitions, pre-processing, feature selection, text representation and types of text mining. Then, we describe a fundamental text mining approach that we used for the development of a chromosome-21 database. Next, we present our three novel text mining techniques: (i) text association mining with cross-sentence inference, (ii) structure-based document model, and (iii) multi-relational text mining. Our techniques emphasize novel hypothesis generation, document representation and multi-relational discovery, respectively. In the text association mining with cross-sentence inference, statistical co-occurrences of terms and syntactic sentence structure analysis are initially used to find associations among key terms in documents. Subsequently, potential novel hypotheses are derived from the discovered associations. In a different way, the structure-based document model introduces two novel document representations for text documents that take into account not only term frequencies and patterns of term occurrences, but also the document's structural information. Based on the experimental results, our structure-based document models are superior to existing non-structure-based ones. Finally, the multi-relational text mining enhances a literature-based discovery method with multi-relational data mining and Inductive Logic Programming. It is aimed to discover relational knowledge in forms of frequent relational patterns and relational association rules from disjoint sets of literatures. These relational patterns and rules are complementary to the indirect connections found by existing literature-based discovery, and can be used for exploratory research.
机译:随着已发布文档的指数级增长,文本挖掘已成为自动提取信息和发现隐藏信息/知识的重要工具。本文从文本挖掘的概述开始,涵盖了关键定义,预处理,特征选择,文本表示和文本挖掘的类型。然后,我们描述了一种基本的文本挖掘方法,该方法用于开发21号染色体数据库。接下来,我们介绍三种新颖的文本挖掘技术:(i)具有交叉句子推理的文本关联挖掘,(ii)基于结构的文档模型和(iii)多关系文本挖掘。我们的技术分别强调新颖的假设生成,文档表示和多关系发现。在具有交叉句子推论的文本关联挖掘中,术语的统计共现和句法句子结构分析最初用于在文档中的关键术语之间查找关联。随后,从发现的关联中得出潜在的新颖假设。以不同的方式,基于结构的文档模型为文本文档引入了两种新颖的文档表示形式,它们不仅考虑了术语出现的频率和术语的出现方式,还考虑了文档的结构信息。根据实验结果,我们的基于结构的文档模型优于现有的非基于结构的文档模型。最后,多关系文本挖掘通过多关系数据挖掘和归纳逻辑编程增强了基于文献的发现方法。它旨在从不相交的文献集中以频繁的关系模式和关系关联规则的形式发现关系知识。这些关系模式和规则是对现有基于文献的发现所发现的间接联系的补充,可用于探索性研究。

著录项

  • 作者

    Thaicharoen, Supphachai.;

  • 作者单位

    University of Colorado at Denver.;

  • 授予单位 University of Colorado at Denver.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 131 p.
  • 总页数 131
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 石油、天然气工业;
  • 关键词

  • 入库时间 2022-08-17 11:37:57

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号