Cross-Comparison for Two-Dimensional Text Categorization

机译：二维文本分类的交叉比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although, successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.

机译：大型文本集合的组织是自动文本分类的主要目标。特别是，最终目标是以有效的方式和尽可能多的精度将文档分类为一定数量的预定义类。在线和运行时服务，如个性化服务和信息过滤服务，增加了有效和高效的文档分类技术的重要性。在过去几年中，广泛的监督学习算法已经应用于这个问题。最近，提出了一种利用文本分类数据的二维摘要的新方法。该方法不通过各种单词阶段;相反，它使用整个字典以直观的方式在二维图上以直观方式呈现数据。虽然，在分类效果和效率方面取得了成功（如[3]），但这种方法提出了一些未解决的关键问题：培训算法的设计似乎是Reuters-21578集合的临时;评估只有在10次最常见的路透社-21578数据集上只完成;评价缺乏大多数零件的重要性;采用的方法缺乏数学理由。我们专注于前三个方面，将第四个作为未来的工作。

著录项

来源
《International Conference on String Processing and Information Retrieval》|2004年||共2页
会议地点
作者
Giorgio Maria Di Nunzio;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text [J] . Gadri Said, Moussaoui Abdelouahab The international arab journal of information technology . 2017,第6期

机译：上下文文本分类：一种改进的词干算法，可提高阿拉伯文本分类的质量
2. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
3. A Novel Text Representation Model to Categorize Text Documents using Convolution Neural Network [J] . M. B. Revanasiddappa, B. S. Harish International Journal of Intelligent Systems and Applications . 2019,第5期

机译：利用卷积神经网络对文本文档进行分类的新型文本表示模型
4. Cross-Comparison for Two-Dimensional Text Categorization [C] . Giorgio Maria Di Nunzio International Conference on String Processing and Information Retrieval(SPIRE 2004); 20041005-08; Padova(IT) . 2004

机译：二维文本分类的交叉比较
5. The implementation of dynamic document organization using the integration of text clustering and text categorization. [D] . Jo, Taeho. 2006

机译：使用文本聚类和文本分类的集成来实现动态文档组织。
6. Categorization of Two-Dimensional and Three-Dimensional Stimuli by 18-Month-Old Infants [O] . Martha E. Arterberry, Marc H. Bornstein, Julia B. Blumenstyk -1

机译：18个月大婴儿的二维和三维刺激分类
7. Two-dimensional Clustering for Text Categorization [O] . Hiroya Takamura, Yuji Matsumoto 2002

机译：二维聚类用于文本分类

Cross-Comparison for Two-Dimensional Text Categorization

摘要

著录项

相似文献

相关主题

期刊订阅