Automatic Bengali Document Categorization Based on Word Embedding and Statistical Learning Approaches

机译：基于词嵌入和统计学习方法的孟加拉文文档自动分类

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The automated categorization of text documents into predetermined categories has witnessed a growing in the last few years, due to the huge availability of documents in digital form and the ensuing need to organize them. Automatic document categorization is the process of assigning one or more categories or classes to a document, making it easier to manipulate and sort. This paper proposes a Bengali document categorization technique based on word2vec word embedding model and stochastic gradient descent (SGD) statistical learning algorithm with multi-class svm. The semantic features of a document are extracting by Word2Vec and SGD improve the classification complexity with multi-class SVM that classify the unlabeled data. The experimental result with 10000 training and 4651 testing documents shows the 93.33% accuracy.

机译：在过去的几年中，由于数字形式的文档的大量可用性以及随之而来的组织需求，将文本文档自动分类为预定类别的情况正在增长。自动文档分类是将一个或多个类别或类分配给文档的过程，从而使其更易于操作和分类。本文提出了一种基于word2vec单词嵌入模型和多类svm的随机梯度下降（SGD）统计学习算法的孟加拉文文档分类技术。 Word2Vec提取了文档的语义特征，而SGD通过对未标记数据进行分类的多类SVM提高了分类的复杂性。经过10000次训练和4651个测试文件的实验结果显示出93.33％的准确性。

著录项

来源
《2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering》|2018年|1-6|共6页
会议地点 Rajshahi(BD)
作者
Md. Rajib Hossain; Mohammed Moshiul Hoque;
展开▼
作者单位

Dept. of Computer Science Engineering, Chittagong University of Engineering Technology, Chittagong, Bangladesh;

Dept. of Computer Science Engineering, Chittagong University of Engineering Technology, Chittagong, Bangladesh;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Training; Feature extraction; Support vector machines; Semantics; Text categorization; Classification algorithms; Testing;

机译：培训;特征提取;支持向量机;语义;文本分类;分类算法;测试;;

相似文献

外文文献
中文文献
专利

1. Comparing a Rule-Based Versus Statistical System for Automatic Categorization of MEDLINE Documents According to Biomedical Specialty [J] . Susanne M. Humphrey, Aurelie Neveol, Allen Browne, Journal of the American Society for Information Science and Technology . 2009,第13期

机译：比较基于规则的统计系统和根据生物医学专业对MEDLINE文档进行自动分类的统计系统
2. Word Embedding based Textual Semantic Similarity Measure in Bengali [J] . MD. Asif Iqbal, Omar Sharif, Mohammed Moshiul Hoque, Procedia Computer Science . 2021,第a期

机译：孟加拉语嵌入基于文本语义相似度量的词
3. A Multi Layer Perceptron Along with Memory Efficient Feature Extraction Approach for Bengali Document Categorization [J] . Quazi Ishtiaque Mahmud, Noymul Islam Chowdhury, Md Masum Journal of computer sciences . 2020,第3期

机译：多层Perceptron以及孟加拉文档分类的记忆有效特征提取方法
4. Automatic Bengali Document Categorization Based on Word Embedding and Statistical Learning Approaches [C] . Md. Rajib Hossain, Mohammed Moshiul Hoque International Conference on Computer, Communication, Chemical, Material and Electronic Engineering . 2018

机译：基于Word嵌入和统计学习方法的自动孟加拉文档分类
5. Language models and automatic topic categorization for information retrieval in handwritten documents [D] . Farooq, Faisal 2008

机译：用于手写文档中信息检索的语言模型和自动主题分类
6. Comparing a Rule Based vs. Statistical System for Automatic Categorization of MEDLINE® Documents According to Biomedical Specialty [O] . Susanne M. Humphrey, Aurélie Névéol, Julien Gobeil, -1

机译：基于规则与统计系统自动分类mEDLINE®文献根据生物医学专业比较
7. Word based off-line handwritten Arabic classification and recognition. Design of automatic recognition system for large vocabulary offline handwritten Arabic words using machine learning approaches. [O] . AlKhateeb Jawad Hasan Yasin 2010

机译：基于单词的离线手写阿拉伯语分类和识别。利用机器学习方法设计大词汇量离线阿拉伯语手写单词自动识别系统。

Automatic Bengali Document Categorization Based on Word Embedding and Statistical Learning Approaches

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅