Effectively classifying short texts by structured sparse representation with dictionary filtering

Gao Longwen; Zhou Shuigeng; Guan Jihong

首页> 外文期刊>Information Sciences: An International Journal >Effectively classifying short texts by structured sparse representation with dictionary filtering

【24h】

Effectively classifying short texts by structured sparse representation with dictionary filtering

机译：通过字典过滤的结构化稀疏表示有效地对短文本进行分类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Short text classification (STC) has attracted increasing interest recently with the rapid growth of Web and social media data existing in short text form. It is a more challenging task than traditional text classification (TC) because of the feature sparsity of the processed short texts, which makes the state of the art TC approaches perform poorly on short texts if being applied straightforwardly. Existing STC approaches deal with the sparse problem mainly by enriching text content with outer corpora or additional information. Though better performance can be obtained, the performance heavily relies on the amount and quality of outer or additional information. What is worse, such outer or additional information is not always available, not to mention the high cost for acquiring such information. In this paper, we introduce a structured sparse representation classifier to effectively classify short texts, and develop an effective approach called convex hull vertices selection to reduce data correlation and redundancy of the dictionary (the set of training texts), which thus substantially boosts STC efficiency and performance. To the best of our knowledge, this is the first work that exploits structured sparsity for STC. Experiments over five datasets show that the proposed approach outperforms the state of the art TC methods in classification effectiveness and the traditional SR classifier in both classification effectiveness and classification efficiency. Furthermore, we carry out an experiment to classify short texts expanded by additional content, which indirectly shows that our approach performs better than the existing SIC methods that exploit external text sources. (C) 2015 Elsevier Inc. All rights reserved.

机译：随着以短文本形式存在的Web和社交媒体数据的快速增长，短文本分类（STC）最近引起了越来越多的兴趣。由于所处理的短文本的特征稀疏性，它比传统的文本分类（TC）更具挑战性，这使得如果直接应用TC技术，则对短文本的处理效果将很差。现有的STC方法主要通过使用外部语料库或其他信息丰富文本内容来处理稀疏问题。尽管可以获得更好的性能，但是性能在很大程度上取决于外部或附加信息的数量和质量。更糟糕的是，此类外部或附加信息并不总是可用，更不用说获取此类信息的高昂成本了。在本文中，我们引入了一种结构化的稀疏表示分类器，以有效地对短文本进行分类，并开发了一种有效的方法，即凸包顶点选择，以减少字典（训练文本集）的数据相关性和冗余性，从而大大提高了STC的效率和性能。据我们所知，这是为STC开发结构化稀疏性的第一项工作。在五个数据集上进行的实验表明，该方法在分类有效性和分类效率方面均优于现有的TC方法，在分类有效性方面优于传统的SR分类器。此外，我们进行了一个实验，对通过附加内容扩展的短文本进行分类，这间接表明我们的方法比利用现有外部文本源的现有SIC方法的性能更好。（C）2015 Elsevier Inc.保留所有权利。

著录项

来源
《Information Sciences: An International Journal》 |2015年第null期|共13页
作者
Gao Longwen; Zhou Shuigeng; Guan Jihong;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动信息理论;
关键词
Short text classification; Sparse representation; Group sparsity; Dictionary filtering;

机译：短文本分类;稀疏表示;组稀疏性;字典过滤;

相似文献

外文文献
中文文献
专利

1. Effectively classifying short texts by structured sparse representation with dictionary filtering [J] . Gao Longwen, Zhou Shuigeng, Guan Jihong Information Sciences: An International Journal . 2015,第Null期

机译：通过字典过滤的结构化稀疏表示有效地对短文本进行分类
2. Filtering noisy chaotic signal via sparse representation based on random frame dictionary [J] . Xie Zong-Bo, Feng Jiu-Chao 中国物理：英文版 . 2010,第005期
3. Optimal decision fusion using sparse representation-based classifiers on monogenic-signal dictionaries for SAR ATR [J] . Shafie B. M., Moallem P., Sabahi M. F. Electronics Letters . 2020,第12期

机译：基于稀疏表示的最佳决策融合在SAR ATR的单一信号字典上的基于稀疏表示的分类器
4. Effectively Classify Short Texts with Sparse Representation Using Entropy Weighted Constraint [C] . Ting Tuo, Huifang Ma, Zhixin Li, International Conference on Knowledge Science, Engineering and Management . 2019

机译：使用熵加权约束有效地对具有稀疏表示形式的短文本进行分类
5. Underwater UXO classification using matched subspace classifier with synthetic sparse dictionaries. [D] . Hall, John Joseph. 2016

机译：使用匹配的子空间分类器和合成的稀疏字典进行水下UXO分类。
6. Sparse Representation of Deformable 3D Organs with Spherical Harmonics and Structured Dictionary [O] . Dan Wang, Ahmed H. Tewfik, Yingchun Zhang, 2011

机译：具有球形谐波和结构化字典的可变形3D器官的稀疏表示
7. SAR IMAGE FILTERING VIA LEARNED DICTIONARIES AND SPARSE REPRESENTATIONS [O] . Samuel Foucher 2010

机译：通过学习的字典和稀疏表示对SAR图像进行滤波

Effectively classifying short texts by structured sparse representation with dictionary filtering

摘要

著录项

相似文献

相关主题

期刊订阅