首页> 外文会议>International conference on intelligent computing;CICI 2009 >Profile Based Algorithm to Topic Spotting in Reuter21578
【24h】

Profile Based Algorithm to Topic Spotting in Reuter21578

机译:Reuter21578中基于配置文件的主题发现算法

获取原文

摘要

This research proposes an alternative approach to machine learning based ones for categorizing online news articles in Reuter21578. For using machine learning based approaches for any task of text mining or information retrieval, documents should be encoded into numerical vectors; two problems, huge dimensionality and sparse distribution, caused by encoding so. Although there are various tasks of text mining such as text categorization, text clustering, and text summarization, the scope of this research is restricted to text categorization. The idea of this research is to avoid the two problems by encoding a document or documents into a table, instead of numerical vectors. Therefore, the goal of this research is to improve the performance of text categorization by avoiding the two problems.
机译:这项研究提出了一种基于机器学习的替代方法,用于对Reuter21578中的在线新闻文章进行分类。为了将基于机器学习的方法用于文本挖掘或信息检索的任何任务,应将文档编码为数值向量。编码带来的两个问题,巨大的维数和稀疏的分布。尽管文本挖掘有各种各样的任务,例如文本分类,文本聚类和文本摘要,但本研究的范围仅限于文本分类。这项研究的想法是通过将一个或多个文档编码为表格而不是数值向量来避免这两个问题。因此,本研究的目的是通过避免两个问题来提高文本分类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号