首页> 外文会议>Asia-Pacific World Congress on Computer Science and Engineering >Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization
【24h】

Performance Analysis of Supervised Machine Learning Approaches for Bengali Text Categorization

机译:孟加拉语文本分类的监督机器学习方法的性能分析

获取原文

摘要

In this digital era, enormous amount of data are being generated everyday, and most of them are unstructured textual data. An automated text classifier helps to categorize the texts automatically into pre-defined categories. With the help of machine learning we can learn about the features of precategorized documents and predict document’s category. Bengali language is one of the most spoken languages in the world. It has become essential to implement automated text categorization for Bengali language. Text categorization mostly uses data mining algorithms along with NLP tools, feature extraction and selection methods with vector space modeling. In this paper, we have measured the performance of Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent (SGD) and Logistic Regression (LR) methods using an open source Bengali newspaper article corpus containing 84; 906 articles of 10 categories. The impact of the size of the training dataset on the accuracy of the classification was examined for different algorithms. We have documented the execution time to train the methods and discussed issues and challenges in Bengali text categorization. This paper can be used as a reference work for future researchers in Bengali text categorization.
机译:在这个数字时代,每天都会生成大量数据,其中大多数是非结构化的文本数据。自动文本分类器有助于将文本自动分类为预定义的类别。借助机器学习,我们可以了解预分类文档的功能并预测文档的类别。孟加拉语是世界上使用最多的语言之一。实施孟加拉语语言的自动文本分类已经变得至关重要。文本分类主要使用数据挖掘算法以及NLP工具,带有向量空间建模的特征提取和选择方法。在本文中,我们使用包含84个开源孟加拉语的报纸文章语料库,测量了支持向量机(SVM),多项式朴素贝叶斯(MNB),随机梯度下降(SGD)和逻辑回归(LR)方法的性能。 906条10大类。对于不同的算法,研究了训练数据集大小对分类准确性的影响。我们记录了执行时间以训练方法,并讨论了孟加拉语文本分类中的问题和挑战。本文可作为将来孟加拉语文本分类研究人员的参考工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号