首页> 外文会议>Natural language processing Pacific Rim symposium >TopClass: Topic-based Conceptual Text Categorization Using MRD
【24h】

TopClass: Topic-based Conceptual Text Categorization Using MRD

机译:TopClass:基于主题的概念文本分类使用MRD

获取原文

摘要

Text categorization for unrestricted text is one of the important issues in the field of information retrieval. The crux of the problem is to discover a model that relates words in a document to its general subject area. It seems to be very difficult to statistically acquire enough word-based knowledge to make a robust system capable of automatically categorizing unrestricted text. The major problems with word-based text categorization models include data sparseness and the lack of a level of abstraction. Word-based text categorization systems are hard to train sufficiently well, furthermore, they are difficult to port to new domains and run off the shelf. In this paper, we will show that a concept-based model for text categorization requires fewer parameters and has a built in element of generality. Broad lexical conceptual knowledge acquired from machine readable dictionaries can be used to produce a robust and portable text categorization system. A series of experiments was conducted to categorize on-line news obtained from the Internet in order to assess the performance of the proposed method. Experimental results show that the MRDs function effectively as a knowledge base for assigning subject areas to news articles and for text categorization in general.
机译:无限制文本的文本分类是信息检索领域的重要问题之一。问题的关键是发现一个模型,将文档中的单词与其一般主题区域相关联。在统计上似乎非常困难地获取足够的基于Word的知识,以使能够自动分类不受限制的文本的强大系统。基于Word的文本分类模型的主要问题包括数据稀疏和缺乏抽象级别。基于Word的文本分类系统难以充分训练,此外,它们难以进入新域并耗尽架子。在本文中,我们将表明,文本分类的基于概念的模型需要更少的参数,并且具有构建的普遍性元素。从机器可读词典获取的广泛词汇概念知识可用于生成强大和便携式文本分类系统。进行了一系列实验以对从互联网获得的在线新闻进行分类,以评估所提出的方法的性能。实验结果表明,MRDS函数有效地作为向新闻文章分配主题区域和一般文本分类的知识库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号