首页> 外文学位 >Text mining with the exploitation of user's background knowledge: Discovering novel association rules from text.
【24h】

Text mining with the exploitation of user's background knowledge: Discovering novel association rules from text.

机译:利用用户的背景知识进行文本挖掘:从文本中发现新颖的关联规则。

获取原文
获取原文并翻译 | 示例

摘要

The goal of text mining is to find interesting and non-trivial patterns or knowledge from unstructured documents. Both objective and subjective measures have been proposed in the literature to evaluate the interestingness of discovered patterns. However, objective measures alone are insufficient because such measures do not consider knowledge and interests of the users. Subjective measures require explicit input of user expectations which is difficult or even impossible to obtain in text mining environments.;This study proposes a user-oriented text-mining framework and applies it to the problem of discovering novel association rules from documents. The developed system, uMining, consists of two major components: a background knowledge developer and a novel association rules miner. The background knowledge developer learns a user's background knowledge by extracting keywords from documents already known to the user (background documents) and developing a concept hierarchy to organize popular keywords. The novel association rule miner discovers association rules among noun phrases extracted from relevant documents (target documents) and compares the rules with the background knowledge to predict the rule novelty to the particular user (user-oriented novelty).;The user-oriented novelty measure is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. It consists of two components: occurrence distance and connection distance. The former considers the co-occurrences of two keywords in the background documents: the more they co-occur, the shorter the distance. The latter considers the common connections of two keywords with others in the concept hierarchy. It is defined as the length of the shortest path connecting the two keywords in the concept hierarchy: the longer the path, the larger the distance.;The user-oriented novelty measure is evaluated from two perspectives: novelty prediction accuracy and usefulness indication power. The results show that the user-oriented novelty measure outperforms the WordNet novelty measure and the compared objective measures in term of predicting novel rules and identifying useful rules.
机译:文本挖掘的目的是从非结构化文档中找到有趣且平凡的模式或知识。文献中已经提出了客观和主观措施,以评估发现的模式的趣味性。但是,仅客观措施是不够的,因为这样的措施没有考虑用户的知识和利益。主观措施要求用户期望的明确输入,这在文本挖掘环境中是很难甚至无法实现的。本研究提出了一种面向用户的文本挖掘框架,并将其应用于从文档中发现新颖的关联规则的问题。开发的系统uMining包含两个主要组件:背景知识开发人员和新型关联规则挖掘程序。背景知识开发人员通过从用户已知的文档(背景文档)中提取关键字并开发概念层次结构来组织流行的关键字,从而学习用户的背景知识。新型关联规则挖掘器发现从相关文档(目标文档)提取的名词短语之间的关联规则,并将该规则与背景知识进行比较,以预测特定用户的规则新颖性(面向用户的新颖性)。定义为背景知识中的前提与规则的结果之间的语义距离。它由两个部分组成:发生距离和连接距离。前者考虑了背景文档中两个关键字的同时出现:它们出现的次数越多,距离越短。后者考虑了概念层次结构中两个关键字与其他关键字的公共连接。它被定义为在概念层次结构中连接两个关键字的最短路径的长度:路径越长,距离就越大。;从两个方面评估面向用户的新颖性度量:新颖性预测准确性和有用性指示能力。结果表明,在预测新颖规则和识别有用规则方面,面向用户的新颖性度量优于WordNet新颖性度量和比较的客观度量。

著录项

  • 作者

    Chen, Xin.;

  • 作者单位

    New Jersey Institute of Technology.;

  • 授予单位 New Jersey Institute of Technology.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 164 p.
  • 总页数 164
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:41:02

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号