Using ParsBert on Augmented Data for Persian News Classification

机译：使用帕尔斯伯特在波斯新闻分类中的增强数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text classification is a fundamental task in Natural Language Processing (NLP). Although many works have been done to perform text classification in English, the number of studies on Persian text classification is limited. Previous works on Persian text classification often use classic machine learning methods such as Naive Bayes, Support Vector Machines, Decision Trees, etc. While these methods are fast and straightforward, they need feature engineering, and their performance heavily depends on the selected features. In this paper, we first augment the input words with their stem form and then use a pre-trained language model for the Persian language (ParsBERT) to classify the text. Augmenting the input words with their stem form enables the proposed classifier to generalize well to the new unseen data. We compare the performance of our proposed model with that of traditional machine learning algorithms. The results show that the proposed model achieves a 0.91 accuracy and outperforms the traditional machine learning algorithm by at least +0.4 absolute on both accuracy and F1 score.

机译：文本分类是自然语言处理（NLP）中的基本任务。虽然已经完成了许多作品来进行英语进行文本分类，但波斯文本分类的研究数量有限。以前的工作波斯文本分类通常使用经典机器学习方法，如天真的贝叶斯，支持向量机，决策树等。虽然这些方法快速直截了当，但它们需要特征工程，它们的性能大量取决于所选功能。在本文中，我们首先使用它们的Stem形式增强输入单词，然后使用预先接受训练的语言模型来分类文本。使用step形式增强输入单词使提出的分类器能够概括为新的未完成数据。我们将拟议模型与传统机器学习算法的表现进行比较。结果表明，该拟议模型通过精度和F1得分至少+0.4绝对地实现了0.91的精度，优于传统的机器学习算法。

著录项

来源
《International Conference on Web Research》|2021年|78-81|共4页
会议地点
作者
Mohammadreza Varasteh; Arefeh Kazemi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Support vector machines; Machine learning algorithms; Text categorization; Machine learning; Data models; Natural language processing; Decision trees;

机译：支持向量机;机器学习算法;文本分类;机器学习;数据模型;自然语言处理;决定树;
入库时间 2022-08-26 13:56:36

相似文献

外文文献
中文文献
专利

1. ECGAN: An Improved Conditional Generative Adversarial Network With Edge Detection to Augment Limited Training Data for the Classification of Remote Sensing Images With High Spatial Resolution [J] . Baikai Sui, Tao Jiang, Zhen Zhang, Selected Topics in Applied Earth Observations and Remote Sensing, IEEE Journal of . 2021,第1期

机译：ECGAN：一种改进的条件生成对冲网络，具有边缘检测，以增加具有高空间分辨率的遥感图像的分类限制训练数据
2. Transfer learning using inception-ResNet-v2 model to the augmented neuroimages data for autism spectrum disorder classification [J] . Dominic Nicholas, Daniel, Cenggoro Tjeng Wawan, Communications in Mathematical Biology and Neuroscience . 2021,第a期

机译：使用Inception-Resnet-V2模型转移学习，以增强神经图像数据进行自闭症谱系障碍分类
3. A laminar augmented cascading flexible neural forest model for classification of cancer subtypes based on gene expression data [J] . Zhong Lianxin, Meng Qingfang, Chen Yuehui, BMC Bioinformatics . 2021,第1期

机译：基于基因表达数据的癌症亚型分类的层状增强级联柔性神经林模型
4. Distributed classification of Persian News (Case study: Hamshahri News dataset) [C] . Esmaeili Leila, Akbari Mohammad Kazem, Amiry Vahid, International Conference on Computer and Knowledge Engineering . 2013

机译：波斯新闻的分布式分类（案例研究：Hamshahri新闻数据集）
5. Augmenting News Stories With Distinct Information [D] . Iacobelli, Francisco 2011

机译：通过独特的信息增强新闻报道
6. Augmented Movelet Method for Activity Classification Using Smartphone Gyroscope and Accelerometer Data [O] . Emily J. Huang, Jukka-Pekka Onnela 2020

机译：使用智能手机陀螺和加速度计数据进行增强的Moveloce方法
7. Neural networks for web news classification based on augmented PCA [O] . Selamat Ali, Omatu Sigeru 2003

机译：基于增强PCA的网络新闻分类神经网络
8. General Data on Bottom Sediments Including Concentration of Various Elements and Hydrocarbons in the Persian Gulf of Oman. (Report C). General Data on the Geophysical Nature of the Persian Gulf and Gulf of Oman. (Report D) [R] . Ross, D. A. , Stoffers, P. 1978

机译：包括阿曼波斯湾各种元素和碳氢化合物浓度在内的底部沉积物的一般数据。（报告C）。关于波斯湾和阿曼湾地球物理性质的一般数据。（报告D）

Using ParsBert on Augmented Data for Persian News Classification

摘要

著录项

相似文献

相关主题

期刊订阅