首页> 外文会议>International Conference on Web Research >Using ParsBert on Augmented Data for Persian News Classification
【24h】

Using ParsBert on Augmented Data for Persian News Classification

机译:使用帕尔斯伯特在波斯新闻分类中的增强数据

获取原文

摘要

Text classification is a fundamental task in Natural Language Processing (NLP). Although many works have been done to perform text classification in English, the number of studies on Persian text classification is limited. Previous works on Persian text classification often use classic machine learning methods such as Naive Bayes, Support Vector Machines, Decision Trees, etc. While these methods are fast and straightforward, they need feature engineering, and their performance heavily depends on the selected features. In this paper, we first augment the input words with their stem form and then use a pre-trained language model for the Persian language (ParsBERT) to classify the text. Augmenting the input words with their stem form enables the proposed classifier to generalize well to the new unseen data. We compare the performance of our proposed model with that of traditional machine learning algorithms. The results show that the proposed model achieves a 0.91 accuracy and outperforms the traditional machine learning algorithm by at least +0.4 absolute on both accuracy and F1 score.
机译:文本分类是自然语言处理(NLP)中的基本任务。虽然已经完成了许多作品来进行英语进行文本分类,但波斯文本分类的研究数量有限。以前的工作波斯文本分类通常使用经典机器学习方法,如天真的贝叶斯,支持向量机,决策树等。虽然这些方法快速直截了当,但它们需要特征工程,它们的性能大量取决于所选功能。在本文中,我们首先使用它们的Stem形式增强输入单词,然后使用预先接受训练的语言模型来分类文本。使用step形式增强输入单词使提出的分类器能够概括为新的未完成数据。我们将拟议模型与传统机器学习算法的表现进行比较。结果表明,该拟议模型通过精度和F1得分至少+0.4绝对地实现了0.91的精度,优于传统的机器学习算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号