首页> 外文学位 >Developing semantic digital libraries using data mining techniques.
【24h】

Developing semantic digital libraries using data mining techniques.

机译:使用数据挖掘技术开发语义数字图书馆。

获取原文
获取原文并翻译 | 示例

摘要

We define the semantic digital libraries as the digital libraries that can discover hidden, useful information from large amounts of stored data using data mining techniques such as clustering, classification, association rule mining, and visualization techniques. To build a semantic digital library, we first propose an integrated digital library system that provides multiple viewpoints of harvested metadata collections by combining search and data mining technologies. This system provides three value-added services: (1) the cross-archive search service provides a term view of harvested metadata, (2) the concept browsing service provides a subject view of harvested metadata, and (3) the collection summary service provides a collection view of each metadata collection. We also propose a text data mining method using a hierarchical self-organizing map algorithm to build concept hierarchies from Dublin Core metadata.;We then present a new classification method, called Associative Naive Bayes (ANB), to associate MEDLINE citations with Gene Ontology (GO) terms. We define the concept of class-support to find frequent itemsets and the concept of class-all-confidence to find interesting itemsets. In the training phase, ANB finds frequent and interesting itemsets and estimates the class prior probabilities and the probabilities of itemsets for all classes. Once the frequent and interesting itemsets are discovered in the training phase, new unlabeled examples are classified by the classification algorithm by incrementally choosing the most interesting itemset. Empirical test results on three MEDLINE datasets show that ANB is superior to both naive Bayesian classifier and Large Bayes. The results also show that ANB is more scalable than Support Vector Machines.;Finally, we present a text mining method that uses both text categorization and text clustering for building concept hierarchies for MEDLINE citations. The approach we propose is a three-step data mining process for organizing MEDLINE database: (1) categorizations according to Medical Subject Headings (MeSH) terms, MeSH major topics, and the co-occurrence of MeSH descriptors; (2) clustering using the results of MeSH term categorization; and (3) visualization of categories and hierarchical clusters. The hierarchies automatically generated may be used to support users in browsing behavior and help them identify good starting points for searching.
机译:我们将语义数字图书馆定义为可以使用数据挖掘技术(例如聚类,分类,关联规则挖掘和可视化技术)从大量存储的数据中发现隐藏的有用信息的数字图书馆。为了构建语义数字图书馆,我们首先提出一个集成的数字图书馆系统,该系统通过结合搜索和数据挖掘技术来提供收集的元数据集合的多种观点。该系统提供了三种增值服务:(1)跨档案搜索服务提供了收获的元数据的术语视图;(2)概念浏览服务提供了收获的元数据的主题视图;(3)收集摘要服务提供了每个元数据集合的集合视图。我们还提出了一种文本数据挖掘方法,该方法使用层次结构自组织映射算法从Dublin Core元数据中构建概念层次结构;然后我们提出了一种新的分类方法,称为原生朴素贝叶斯(ANB),将MEDLINE引用与Gene Ontology相关联( GO)条款。我们定义了班级支持的概念来查找频繁的项目集,并定义了班级所有信心的概念来查找有趣的项目集。在训练阶段,ANB查找频繁且有趣的项目集,并估计班级先验概率和所有课程的项目集概率。在训练阶段发现频繁且有趣的项目集后,分类算法会通过逐步选择最有趣的项目集来对新的未标记示例进行分类。在三个MEDLINE数据集上的经验测试结果表明,ANB优于朴素贝叶斯分类器和大贝叶斯分类器。结果还表明,ANB比支持向量机更具可伸缩性。最后,我们提出了一种文本挖掘方法,该方法同时使用文本分类和文本聚类来构建MEDLINE引用的概念层次结构。我们建议的方法是组织MEDLINE数据库的三步数据挖掘过程:(1)根据医学主题词(MeSH)术语,MeSH主要主题和MeSH描述符的同时出现进行分类; (2)使用MeSH术语分类结果进行聚类; (3)可视化类别和层次集群。自动生成的层次结构可用于支持用户的浏览行为,并帮助他们确定搜索的良好起点。

著录项

  • 作者

    Kim, Hyunki.;

  • 作者单位

    University of Florida.;

  • 授予单位 University of Florida.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 126 p.
  • 总页数 126
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号