Classifying High-Speed Text Streams

机译：分类高速文本流

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Recently, a new class of data-intensive application becomes widely recognized where data is modeled best as transient open-end streams rather than persistent tables on disk. It leads to a new surge of research interest called data streams. However, most of the reported works are concentrated on structural data, such as bit-sequences, and seldom focus on unstructural data, such as textual documents. In this paper, we propose an efficient classification approach for classifying high-speed text streams. The proposed approach is based on sketches such that it is able to classify the streams efficiently by scanning them only once, meanwhile consuming a small bounded of memory in both model maintenance and operation. Extensive experiments using benchmarks and a real-life news article collection are conducted. The encouraging results indicated that our proposed approach is highly feasible.

机译：最近，一类新的数据密集型应用程序都被广泛认识到数据被建模最好是瞬态开口流，而不是磁盘上的持久性表。它导致新的研究兴趣激增称为数据流。但是，大多数报告的作品集中在结构数据（如比特序列）上，并且很少关注非结构数据，例如文本文件。在本文中，我们提出了一种用于分类高速文本流的有效分类方法。所提出的方法基于草图，使得它能够通过仅扫描一次，同时在模型维护和操作中消耗小的存储器的小界限，能够通过扫描一次，从而能够将流分类。使用基准和现实新闻文章集合进行了广泛的实验。令人鼓舞的结果表明，我们所提出的方法是非常可行的。

著录项

来源
《International Conference on Advances in Web-Age Information Management》|2003年||共13页
会议地点
作者
Gabriel Pui Cheong Fung; Jeffrey Xu Yu; Hongjun Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13-532;
关键词

相似文献

外文文献
中文文献
专利

1. Classifying text streams by keywords using classifier ensemble [J] . Baoguo Yang, Yang Zhang, Xue Li Data & Knowledge Engineering . 2011,第9期

机译：使用分类器集成按关键字对文本流进行分类
2. τ-SS3: A text classifier with dynamic n-grams for early risk detection over text streams [J] . Burdisso Sergio G., Errecalde Marcelo, Montes-y-Gomez Manuel Pattern recognition letters . 2020,第Octa期

机译：τ-ss3：具有动态n-gram的文本分类器，用于早期风险检测文本流
3. Dynamic classifier ensemble for positive unlabeled text stream classification [J] . Shirui Pan, Yang Zhang, Xue Li Knowledge and information systems . 2012,第2期

机译：动态分类器集成，用于积极的未标记文本流分类
4. Classifying High-Speed Text Streams [C] . Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Hongjun Lu International Conference on Advances in Web-Age Information Management . 2003

机译：分类高速文本流
5. A Hierarchical Temporal Memory Sequence Classifier for Streaming Data [D] . Barnett, Jeffrey V. 2020

机译：用于流数据的分层时间内存序列分类器
6. METSP: A Maximum-Entropy Classifier Based Text Mining Tool for Transporter-Substrate Identification with Semistructured Text [O] . Min Zhao, Yanming Chen, Dacheng Qu, -1

机译：METSP：基于最大熵分类器的文本挖掘工具用于半结构化文本的转运体-基质识别
7. Classifying High-Speed Text Streams [O] . Gabriel Pui Cheong, Cheong Fung, Jeffrey Xu Yu, 2007

机译：分类高速文本流

Classifying High-Speed Text Streams

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅