首页> 外文会议> >Discriminative category matching: efficient text classification for huge document collections

【24h】

Discriminative category matching: efficient text classification for huge document collections

机译：区分性类别匹配：高效的文本分类功能，可处理大量文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the rapid growth of textual information available on the Internet, having a good model for classifying and managing documents automatically is undoubtedly important. When more documents are archived, new terms, new concepts and concept-drift will frequently appear Without a doubt, updating the classification model frequently, rather than using the old model for a very long period is absolutely essential. Here, the challenges are: a) obtain a high accuracy classification model; b) consume low computational time for both model training and operation; and c) occupy low storage space. However, none of the existing classification approaches could achieve all of these requirements. In this paper, we propose a novel text classification approach, called discriminative category matching, which could achieve all of the stated characteristics. Extensive experiments using two benchmarks and a large real-life collection are conducted. The encouraging results indicated that our approach is highly feasible.

机译：随着Internet上文本信息的迅速增长，拥有一个自动分类和管理文档的良好模型无疑是很重要的。当归档更多文档时，新术语，新概念和概念漂移将经常出现。毫无疑问，必须经常更新分类模型，而不是长时间使用旧模型，这绝对是必不可少的。这里的挑战是：a）获得高精度的分类模型; b）在模型训练和操作上都消耗较少的计算时间; c）占用较少的存储空间。但是，现有分类方法都无法满足所有这些要求。在本文中，我们提出了一种新颖的文本分类方法，称为判别类别匹配，它可以实现所有陈述的特征。进行了使用两个基准和大量现实生活的广泛实验。令人鼓舞的结果表明，我们的方法是高度可行的。

著录项

来源
《》|2002年|p.187-194|共8页
会议地点
作者
Fung; G.P.C.; Yu; J.X.; Hongjun Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词
pattern matching; text analysis; data mining; Internet; computational complexity; discriminative category matching; efficient text classification; huge document collections; Internet; document classification; document management; concept-drift; compu;

机译：模式匹配;文本分析;数据挖掘; Internet;计算复杂度;判别类别匹配;高效文本分类;大量文档收集; Internet;文档分类;文档管理;概念漂移;计算;

相似文献

外文文献
中文文献
专利

1. Temporal contexts: Effective text classification in evolving document collections [J] . Leonardo Rocha, Fernando Mourao, Hilton Mota, Information Systems . 2013,第3期

机译：时间上下文：不断发展的文档集中的有效文本分类
2. Discriminative features for text document classification [J] . K. Torkkola Pattern Analysis and Applications . 2004,第4期

机译：文本文档分类的区别特征
3. A Fuzzy Matching based Image Classification System for Printed and Handwritten Text Documents [J] . Journal of information technology research . 2020,第2期

机译：基于模糊匹配的印刷和手写文本文档图像分类系统
4. Discriminative category matching: efficient text classification for huge document collections [C] . Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Hongjun Lu IEEE International Conference on Data Mining . 2002

机译：辨别类别匹配：巨大文档集合的有效文本分类
5. Efficient representation and matching of texts and images in scanned book collections [D] . Yalniz, Ismet Zeki. 2014

机译：扫描书集中的文本和图像的有效表示和匹配
6. CDKAM: a taxonomic classification tool using discriminative k-mers and approximate matching strategies [O] . Van-Kien Bui, Chaochun Wei 2020

机译：CDKAM：使用鉴别k-mers和近似匹配策略的分类学分类工具
7. Discriminative Features for Text Document Classification [O] . Kari Torkkola 2002

机译：文本文档分类的判别特征

Discriminative category matching: efficient text classification for huge document collections

摘要

著录项

相似文献

相关主题

期刊订阅