首页> 外文学位 >Machine learning for text mining: Classification, retrieval and recommendation.
【24h】

Machine learning for text mining: Classification, retrieval and recommendation.

机译:文本挖掘的机器学习:分类,检索和推荐。

获取原文
获取原文并翻译 | 示例

摘要

We all witnessed the information explosion of the World Wide Web which has brought us with continuously rapid growth of information and data. However, as the amount of data grows day and night, the need for efficient and effectivemanagement of information has also increased dramatically. As a result, using intelligent computerized algorithms to discover new and useful information from existing data has become a hot-pursuit in recent research of computer and information science.;This thesis addresses the issues of discovering useful information from textual content of the data, as well as efficient management and organization of the data. These research issues are usually referred to as the task of text mining, which is a branch of the broad area of information retrieval research that contains many interesting and challenging problems and applications. In this thesis, we mainly focus on four issues of text mining: text classification (Chapter 2 & 3), text retrieval (Chapter 4), text recommendation (Chapter 5) and topic discovery (Chapter 6). Specifically, Chapter 2 proposes dimension reduction and collaborative filtering techniques to improve the scalability of text classification; Chapter 3 further addresses the performance issue of text classification by introducing a new nearest neighbor classification method; Chapter 4 deals with retrieving correct name entities from the web and textual documents where the names are ambiguous; Chapter 5 deals with text recommendation for scientific documents and webpages; Chapter 6 aims at discovering dynamic topic trends and correlations in scientific documents; Chapter 7 concludes this thesis. We will also try to answer some difficult research questions based on our study.
机译:我们都目睹了万维网的信息爆炸,这使我们的信息和数据持续快速增长。但是,随着数据量的昼夜增长,对信息的高效管理的需求也急剧增加。因此,使用智能计算机算法从现有数据中发现有用的新信息已成为计算机和信息科学领域最近的研究热点。本文旨在解决从数据文本内容中发现有用信息的问题。以及有效的数据管理和组织。这些研究问题通常被称为文本挖掘任务,它是信息检索研究广泛领域的一个分支,其中包含许多有趣且具有挑战性的问题和应用。本文主要研究文本挖掘的四个问题:文本分类(第2章和第3章),文本检索(第4章),文本推荐(第5章)和主题发现(第6章)。具体来说,第2章提出降维和协作过滤技术以提高文本分类的可伸缩性。第三章通过引入一种新的最近邻分类方法进一步解决了文本分类的性能问题。第4章讨论从名称不明确的网络和文本文档中检索正确的名称实体;第5章介绍了科学文献和网页的文字推荐;第6章旨在发现科学文献中动态的主题趋势和相关性;第七章总结了本论文。我们还将根据研究结果尝试回答一些困难的研究问题。

著录项

  • 作者

    Song, Yang.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Applied Mathematics.;Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 190 p.
  • 总页数 190
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:37:58

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号