【24h】

Classifying High-Speed Text Streams

机译:分类高速文本流

获取原文
获取外文期刊封面目录资料

摘要

Recently, a new class of data-intensive application becomes widely recognized where data is modeled best as transient open-end streams rather than persistent tables on disk. It leads to a new surge of research interest called data streams. However, most of the reported works are concentrated on structural data, such as bit-sequences, and seldom focus on unstructural data, such as textual documents. In this paper, we propose an efficient classification approach for classifying high-speed text streams. The proposed approach is based on sketches such that it is able to classify the streams efficiently by scanning them only once, meanwhile consuming a small bounded of memory in both model maintenance and operation. Extensive experiments using benchmarks and a real-life news article collection are conducted. The encouraging results indicated that our proposed approach is highly feasible.
机译:最近,一类新的数据密集型应用程序都被广泛认识到数据被建模最好是瞬态开口流,而不是磁盘上的持久性表。它导致新的研究兴趣激增称为数据流。但是,大多数报告的作品集中在结构数据(如比特序列)上,并且很少关注非结构数据,例如文本文件。在本文中,我们提出了一种用于分类高速文本流的有效分类方法。所提出的方法基于草图,使得它能够通过仅扫描一次,同时在模型维护和操作中消耗小的存储器的小界限,能够通过扫描一次,从而能够将流分类。使用基准和现实新闻文章集合进行了广泛的实验。令人鼓舞的结果表明,我们所提出的方法是非常可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号