Designing Punjabi Poetry Classifiers Using Machine Learning and Different Textual Features

Kaur Jasleen; Saini Jatinderkumar

首页> 外文期刊>The international arab journal of information technology >Designing Punjabi Poetry Classifiers Using Machine Learning and Different Textual Features

【24h】

Designing Punjabi Poetry Classifiers Using Machine Learning and Different Textual Features

机译：使用机器学习和不同文本功能设计旁遮普诗歌分类器

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Analysis of poetic text is very challenging from computational linguistic perspective. Computational analysis of literary arts, especially poetry, is very difficult task for classification. For library recommendation system, poetries can be classified on various metrics such as poet, time period, sentiments and subject matter. In this work, content-based Punjabi poetry classifier was developed using Weka toolset. Four different categories were manually populated with 2034 poems Nature and Festival (NAFE), Linguistic and Patriotic (LIPA), Relation and Romantic (RORE), Philosophy and Spiritual (PHSP) categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. These poetries were passed to various pre-processing sub phases such as tokenization, noise removal, stop word removal, and special symbol removal. 31938 extracted tokens were weighted using Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting scheme. Based upon poetry elements, three different textual features (lexical, syntactic and semantic) were experimented to develop classifier using different machine learning algorithms. Naive Bayes (NB), Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with textual features. The results revealed that semantic feature performed better as compared to lexical and syntactic. The best performing algorithm is SVM and highest accuracy (76.02%) is achieved by incorporating semantic information associated with words.

机译：从计算语言角度分析诗意文本非常具有挑战性。文学，特别是诗歌的计算分析是分类的非常艰巨的任务。对于图书馆推荐系统，诗歌可以在诗人，时间段，情绪和主题等各种指标上进行分类。在这项工作中，使用Weka工具集开发了基于内容的旁遮普诗歌分类器。手动填充了四个不同的类别，用2034诗歌的自然和节日（努力），语言和爱国（Lipa），关系和浪漫（rore），哲学和精神（PHSP）类别包括505,399,529和601诗歌数量，分别。这些诗歌被传递给各种预处理子阶段，例如令牌化，噪音，停止词拆卸和特殊符号拆卸。使用术语频率（TF）和术语频率 - 逆文档频率（TF-IDF）加权方案加权31938提取的令牌。基于诗歌元素，使用不同的机器学习算法尝试使用三种不同的文本特征（词汇，句法和语义）来开发分类器。朴素的贝叶斯（NB），支持向量机，超管和K最近邻算法进行了文本特征。结果表明，与词汇和句法相比，语义特征更好。通过结合与单词相关联的语义信息，最佳执行算法是SVM和最高精度（76.02％）实现。

著录项

来源
《The international arab journal of information technology 》 |2020年第1期| 38-44| 共7页
作者
Kaur Jasleen; Saini Jatinderkumar;
展开▼
作者单位

PP Savani Univ Dept Comp Engn Dhamdod Gujarat India;

Symbiosis Inst Comp Studies & Res Pune Maharashtra India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Classification; naive bayes; hyper pipes; k-nearest neighbour; punjabi; poetry; support vector machine; word net;

机译：分类;天真的贝叶斯;超级管;K-最近的邻居;旁遮普;诗;支持向量机;Word Net;

相似文献

外文文献
中文文献
专利

1. Automatic Punjabi poetry classification using machine learning algorithms with reduced feature set [J] . Jasleen Kaur, Jatinderkumar R. Saini International journal of artificial intelligence and soft computing . 2016 ,第4期

机译：使用具有简化功能集的机器学习算法对旁遮普诗进行自动分类
2. Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features [J] . Budhi Gregorius Satia, Chiong Raymond, Wang Zuli Multimedia Tools and Applications . 2021 ,第9期

机译：使用机器学习分类器和基于文本的功能重新采样不平衡数据以检测假审查
3. Order Estimation of Japanese Paragraphs by Supervised Machine Learning and Various Textual Features [J] . Masaki Murata, Satoshi Ito, Masato Tokuhisa, Journal of Artificial Intelligence and Soft Computing Research . 2015 ,第4期

机译：监督机器学习和各种文本特征对日语段落的顺序估计
4. Analysis of Rhythmic Phrasing: Feature Engineering vs. Representation Learning for Classifying Readout Poetry [C] . Timo Baumann, Hussein Hussein, Burkhard Meyer-Sickendiek . 2018

机译：节奏短语分析：特征工程与表征学习对朗诵诗进行分类
5. Machine Learning Morphisms: A Framework for Designing and Analyzing Machine Learning Work OWs, Applied to Separability, Error Bounds, and 30-Day Hospital Readmissions [D] . ?Cawi, Eric 2021

机译：机器学习态态：设计和分析机器学习工作的框架，适用于可分离性，错误限制和30天医院入院
6. MRI-Based Brain Tumor Classification Using Ensemble of Deep Features and Machine Learning Classifiers [O] . Jaeyong Kang, Zahid Ullah, Jeonghwan Gwak 2021

机译：基于MRI的脑肿瘤分类使用深度特征和机器学习分类器的集合
7. Comparative Analysis of COVID-19 X-ray Images Classification Using Convolutional Neural Network, Transfer Learning, and Machine Learning Classifiers Using Deep Features [O] . 2021

机译：使用深度特征使用卷积神经网络，转移学习和机器学习分类的Covid-19 X射线图像分类的比较分析

Designing Punjabi Poetry Classifiers Using Machine Learning and Different Textual Features

摘要

著录项

相似文献

相关主题

期刊订阅