TopClass: Topic-based Conceptual Text Categorization Using MRD

机译：TopClass：基于主题的概念文本分类使用MRD

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text categorization for unrestricted text is one of the important issues in the field of information retrieval. The crux of the problem is to discover a model that relates words in a document to its general subject area. It seems to be very difficult to statistically acquire enough word-based knowledge to make a robust system capable of automatically categorizing unrestricted text. The major problems with word-based text categorization models include data sparseness and the lack of a level of abstraction. Word-based text categorization systems are hard to train sufficiently well, furthermore, they are difficult to port to new domains and run off the shelf. In this paper, we will show that a concept-based model for text categorization requires fewer parameters and has a built in element of generality. Broad lexical conceptual knowledge acquired from machine readable dictionaries can be used to produce a robust and portable text categorization system. A series of experiments was conducted to categorize on-line news obtained from the Internet in order to assess the performance of the proposed method. Experimental results show that the MRDs function effectively as a knowledge base for assigning subject areas to news articles and for text categorization in general.

机译：无限制文本的文本分类是信息检索领域的重要问题之一。问题的关键是发现一个模型，将文档中的单词与其一般主题区域相关联。在统计上似乎非常困难地获取足够的基于Word的知识，以使能够自动分类不受限制的文本的强大系统。基于Word的文本分类模型的主要问题包括数据稀疏和缺乏抽象级别。基于Word的文本分类系统难以充分训练，此外，它们难以进入新域并耗尽架子。在本文中，我们将表明，文本分类的基于概念的模型需要更少的参数，并且具有构建的普遍性元素。从机器可读词典获取的广泛词汇概念知识可用于生成强大和便携式文本分类系统。进行了一系列实验以对从互联网获得的在线新闻进行分类，以评估所提出的方法的性能。实验结果表明，MRDS函数有效地作为向新闻文章分配主题区域和一般文本分类的知识库。

著录项

来源
《Natural language processing Pacific Rim symposium》|1999年||共7页
会议地点
作者
Sue J. Ker; Jason S. Chang; Helen L. Tu; S. H. Yang; Roger Jyh-Shing Jang; Pei-Hsiung Chin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类机器翻译;
关键词

相似文献

外文文献
中文文献
专利

1. Conceptually categorizing geographic features from text based on latent semantic analysis and ontologies [J] . Yuxia Huang Annals of GIS . 2016,第1a4期

机译：基于潜在语义分析和本体的概念上从文本对地理特征进行分类
2. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
3. An online multi-source summarization algorithm for text readability in topic-based search [J] . Arturo Curiel, Claudio Gutierrez-Soto, Jose-Rafael Rojano-Caceres Computer speech and language . 2021,第Mara期

机译：基于主题的搜索中的文本可读性的在线多源摘要算法
4. TopClass: Topic-based Conceptual Text Categorization Using MRD [C] . Sue J. Ker, Jason S. Chang, Helen L. Tu, Natural language processing Pacific Rim symposium . 1999

机译：TopClass：基于主题的概念文本分类使用MRD
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Disentangling the Independent Contributions of Visual and Conceptual Features to the Spatiotemporal Dynamics of Scene Categorization [O] . Michelle R. Greene, Bruce C. Hansen 2020

机译：解开视觉和概念特征的独立贡献对场景分类的时空动态
7. Conceptual Search and Text Categorization [O] . Ratinov Lev, Roth Dan, Srikumar Vivek 2008

机译：概念搜索和文本分类

TopClass: Topic-based Conceptual Text Categorization Using MRD

摘要

著录项

相似文献

相关主题

期刊订阅