Term weighting scheme for short-text classification: Twitter corpuses

Alsmadi Issa; Hoon Gan Keng

首页> 外文期刊>Neural computing & applications >Term weighting scheme for short-text classification: Twitter corpuses

【24h】

Term weighting scheme for short-text classification: Twitter corpuses

机译：Term weighting scheme for short-text classification: Twitter corpuses

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相关主题

摘要

Term weighting is a well-known preprocessing step in text classification that assigns appropriate weights to each term in all documents to enhance the performance of text classification. Most methods proposed in the literature use traditional approaches that emphasize term frequency. These methods perform reasonably with traditional documents. However, these approaches are unsuitable for social network data with limited length and where sparsity and noise are characteristics of short text. A simple supervised term weighting approach, i.e., SW, which considers the special nature of short texts based on term strength and term distribution, is introduced in these study, and its effect in a high-dimensional vector space over term weighting schemes, which represent baseline term weighting in traditional text classification, are assessed. Two datasets are employed with support vector machine, decision tree, k-nearest neighbor, and logistic regression algorithms. The first dataset, Sanders dataset, is a benchmark dataset that includes approximately 5000 tweets in four categories. The second self-collected dataset contains roughly 1500 tweets distributed in six classes collected using Twitter API. The evaluation applied tenfold cross-validation on the labeled data to compare the proposed approach with state-of-the-art methods. The experimental results indicate that supervised approaches perform varied performance, predominantly better than the unsupervised approaches. However, the proposed approach SW has better performance than other ones in terms of accuracy. SW can deal with the limitations of short texts and mitigate the limitations of traditional approaches in the literature, thus improving performance to 80.83 and 90.64 (F-measure) on Sanders dataset and a self-collected dataset, respectively.

著录项

来源
《Neural computing & applications》 |2019年第8期|3819-3831|共13页
作者
Alsmadi Issa; Hoon Gan Keng;
展开▼
作者单位

Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Pulau Pinang, Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类人工神经网络计算机;人工智能理论;
关键词
Short text; Classification; Term weighting; Social networks; Twitter; Machine learning;

Term weighting scheme for short-text classification: Twitter corpuses

摘要

著录项

相关主题

期刊订阅