首页> 外文会议>2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering >Automatic Bengali Document Categorization Based on Word Embedding and Statistical Learning Approaches
【24h】

Automatic Bengali Document Categorization Based on Word Embedding and Statistical Learning Approaches

机译:基于词嵌入和统计学习方法的孟加拉文文档自动分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

The automated categorization of text documents into predetermined categories has witnessed a growing in the last few years, due to the huge availability of documents in digital form and the ensuing need to organize them. Automatic document categorization is the process of assigning one or more categories or classes to a document, making it easier to manipulate and sort. This paper proposes a Bengali document categorization technique based on word2vec word embedding model and stochastic gradient descent (SGD) statistical learning algorithm with multi-class svm. The semantic features of a document are extracting by Word2Vec and SGD improve the classification complexity with multi-class SVM that classify the unlabeled data. The experimental result with 10000 training and 4651 testing documents shows the 93.33% accuracy.
机译:在过去的几年中,由于数字形式的文档的大量可用性以及随之而来的组织需求,将文本文档自动分类为预定类别的情况正在增长。自动文档分类是将一个或多个类别或类分配给文档的过程,从而使其更易于操作和分类。本文提出了一种基于word2vec单词嵌入模型和多类svm的随机梯度下降(SGD)统计学习算法的孟加拉文文档分类技术。 Word2Vec提取了文档的语义特征,而SGD通过对未标记数据进行分类的多类SVM提高了分类的复杂性。经过10000次训练和4651个测试文件的实验结果显示出93.33%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号