首页> 外文会议>International Conference of Computer and Information Technology >Layered Representation of Bengali Texts in Reduced Dimension Using Deep Feedforward Neural Network for Categorization
【24h】

Layered Representation of Bengali Texts in Reduced Dimension Using Deep Feedforward Neural Network for Categorization

机译:使用深馈神经网络进行分类,孟加拉文本的分层表示

获取原文

摘要

Automatic text categorization is a primary step in information retrieval where it is necessary to find the most relevant documents in an enormous volume. It is also useful in a wide range of web domains, such as from portal sites to news indexing, or from spam filtering to genre tagging. A significant amount of research works has been carried out in this field, and they are mostly dominated by Support Vector Machines (SVMs) models. Although these models have been very successful, but they require careful feature engineering to achieve optimum results. In this paper, we propose a model for Bengali text categorization that doesn't require feature engineering and is able to capture nonlinearity in data. We had first found a lower dimensional representation for the tf-idf vectors of each document using denoising autoencoders, and then we fed this transformed domain data vector into a deep feedforward network to find its most plausible category. We also show empirically that our model achieves 94.05 % accuracy for 12 categories that surmounts the best existing models on Bengali text categorization.
机译:自动文本分类是信息检索中的主要步骤,其中有必要以巨大的卷查找最相关的文档。它在各种网络域中也是有用的,例如从门户网站到新闻索引,或从垃圾邮件过滤到类型标记。在该领域进行了大量的研究工作,它们主要由支持向量机(SVM)模型主导。虽然这些模型一直非常成功,但他们需要仔细的特色工程来实现最佳结果。在本文中,我们提出了一种不需要特征工程的孟加拉文本分类的模型,并且能够捕获数据中的非线性。我们首先使用Denoising AutoEncoders找到每个文档的TF-IDF向量的较低维度表示,然后我们将此变换的域数据向量送入深馈通网络以找到其最合理的类别。我们还经验展示了我们的模型实现了124.05 %的准确性,为12个类别施加了孟加拉文本分类的最佳现有模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号