Efficient classification of multi-labeled text streams by clashing

Ricardo Nanculef; Ilias Flaounas; Nello Cristianini

首页> 外文期刊>Expert Systems with Application >Efficient classification of multi-labeled text streams by clashing

【24h】

Efficient classification of multi-labeled text streams by clashing

机译：通过冲突对多标签文本流进行有效分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a method for the classification of multi-labeled text documents explicitly designed for data stream applications that require to process a virtually infinite sequence of data using constant memory and constant processing time. Our method is composed of an online procedure used to efficiently map text into a low-dimensional feature space and a partition of this space into a set of regions for which the system extracts and keeps statistics used to predict multi-label text annotations. Documents are fed into the system as a sequence of words, mapped to a region of the partition, and annotated using the statistics computed from the labeled instances colliding in the same region. This approach is referred to as clashing. We illustrate the method in real-world text data, comparing the results with those obtained using other text classifiers. In addition, we provide an analysis about the effect of the representation space dimensionality on the predictive performance of the system. Our results show that the online embedding indeed approximates the geometry of the full corpus-wise TF and TF-IDF space. The model obtains competitive F measures with respect to the most accurate methods, using significantly fewer computational resources. In addition, the method achieves a higher macro-averaged F measure than methods with similar running time. Furthermore, the system is able to learn faster than the other methods from partially labeled streams.

机译：我们提出了一种对多标签文本文档进行分类的方法，该方法是专门为需要使用恒定内存和恒定处理时间来处理几乎无限量的数据序列的数据流应用程序而设计的。我们的方法由一个在线过程组成，该过程用于将文本有效地映射到低维特征空间中，并将该空间划分成一组区域，系统将针对这些区域提取并保留用于预测多标签文本注释的统计信息。文档以单词序列的形式输入到系统中，映射到分区的某个区域，并使用从碰撞在同一区域中的带标签实例计算出的统计信息进行注释。这种方法称为冲突。我们在现实世界的文本数据中说明了该方法，并将结果与使用其他文本分类器获得的结果进行了比较。此外，我们提供了有关表示空间维数对系统预测性能的影响的分析。我们的结果表明，在线嵌入确实近似于完整的语料库TF和TF-IDF空间的几何形状。该模型使用最少量的计算资源就最准确的方法获得了竞争性的F量度。另外，与具有相似运行时间的方法相比，该方法实现了更高的宏平均F度量。此外，该系统能够从部分标记的流中比其他方法更快地学习。

著录项

来源
《Expert Systems with Application》 |2014年第11期|5431-5450|共20页
作者
Ricardo Nanculef; Ilias Flaounas; Nello Cristianini;
展开▼
作者单位

Department of Informatics, Universidad Tecnica Federico Santa Maria, Avenida Espana 1680, Valparaiso, Chile,Intelligent Systems Laboratory, University of Bristol, MVB, Woodland Rd, Bristol, BS8 1UB, UK;

Intelligent Systems Laboratory, University of Bristol, MVB, Woodland Rd, Bristol, BS8 1UB, UK;

Intelligent Systems Laboratory, University of Bristol, MVB, Woodland Rd, Bristol, BS8 1UB, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text classification; Data streams; Multi-label classification; Feature hashing; Massive data mining;

机译：文字分类;数据流;多标签分类;特征哈希;海量数据挖掘;

相似文献

外文文献
中文文献
专利

1. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
2. The Feature Selection Method based on Genetic Algorithm for Efficient of Text Clustering and Text Classification [J] . Sung-Sam Hong, Wanhee Lee, Myung-Mook Han International Journal of Advances in Soft Computing and Its Applications . 2015,第1aSpecial期

机译：基于遗传算法的高效文本聚类和分类的特征选择方法
3. A text classification framework for simple and effective early depression detection over social media streams [J] . Burdisso Sergio G., Errecalde Marcelo, Montes-y-Gomez Manuel Expert Systems with Application . 2019,第NOVa期

机译：文本分类框架，可通过社交媒体流简单有效地进行早期抑郁症检测
4. An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification [C] . Amalia Amalia, Opim Salim Sitompul, Erna Budhiarti Nababan, International Conference on Data Science, Artificial Intelligence, and Business Analytics . 2020

机译：使用fastText进行印度尼西亚语文档分类的高效文本分类
5. Efficient Text Classification with Linear Regression Using a Combination of Predictors for Flu Outbreak Detection [D] . Al Essa, Ali. 2018

机译：线性回归的高效文本分类，使用预测因子组合进行流感暴发检测
6. An efficient classification algorithm for NGS data based on text similarity [O] . Xiangyu Liao, Xingyu Liao, Wufei Zhu, 2018

机译：基于文本相似度的NGS数据有效分类算法
7. Efficient Classification of Multi-Labelled Text Streams by Clashing [O] . Ñanculef, Ricardo, Flaounas, Ilias, Cristianini, Nello 2016

机译：通过碰撞有效分类多标记文本流

Efficient classification of multi-labeled text streams by clashing

摘要

著录项

相似文献

相关主题

期刊订阅