首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Children's Story Classification in Indian Languages Using Linguistic and Keyword-based Features
【24h】

Children's Story Classification in Indian Languages Using Linguistic and Keyword-based Features

机译:使用语言和基于关键字的功能的印度语言中儿童故事分类

获取原文
获取原文并翻译 | 示例
       

摘要

The primary objective of this work is to classify Hindi and Telugu stories into three genres: fable, folk-tale, and legend In this work, we are proposing a framework for story classification (SC) using keyword and part-of-speech (POS) features. For improving the performance of SC system, feature reduction techniques and combinations of various POS tags are explored. Further, we investigated the performance of SC by dividing the story into parts depending on its semantic structure. In this work, stories are (i) manually divided into parts based on their semantics as introduction, main, and climax; and (ii) automatically divided into equal parts based on number of sentences in a story as initial, middle, and end. We have also examined sentence increment model, which aims at determining an optimum number of sentences required to identify story genre by incremental selection of sentences in a story. Experiments are conducted on Hindi and Telugu story corpora consisting of 300 and 150 short stories, respectively. The performance of SC system is evaluated using different combinations of keyword and POS-based features, with three well-established machine learning classifiers: (i) Naive Bayes (NB), (ii) k-Nearest Neighbour (KNN), and (iii) Support Vector Machine (SVM). Performance of the classifier is evaluated using 10-fold cross-validation and effectiveness of classifier is measured using precision, recall, and F-measure. From the classification results, it is observed that adding linguistic information boosts the performance of story classification. In view of the structure of the story, main, and initial parts of the story have shown comparatively better performance. The results from the sentence incremental model have indicated that the first nine and seven sentences in Hindi and Telugu stories, respectively, are sufficient for better classification of stories. In most of the studies, SVM models outperformed the other models in classification accuracy.
机译:这项工作的主要目标是将印地语和泰卢固文的故事分为三种类型:寓言,民间故事和传说,我们正在使用关键字和演讲(POS)提出故事分类(SC)的框架) 特征。为了提高SC系统的性能,探讨了特征减少技术和各种POS标签的组合。此外,根据其语义结构将故事分成零件,我们调查了SC的性能。在这项工作中,故事是(i)根据他们的语义手动分为零件,作为介绍,主要和高潮; (ii)基于故事中的句子数量自动分为相等的部分,作为初始,中间和结束。我们还检查了句子增量模型,该模型旨在确定通过在故事中逐步选择句子识别故事类型所需的最佳句子。实验是在印地语和泰卢固定故事的情况下进行的,分别由300和150短篇小说组成。使用基于关键字和POS的特征的不同组合来评估SC系统的性能,其中包含三种成熟的机器学习分类器:(i)朴素贝叶斯(Nb),(ii)k最近邻(knn)和(iii )支持向量机(SVM)。使用精度,召回和F度量测量分类器的交叉验证和效力来评估分类器的性能。从分类结果中,观察到添加语言信息提高了故事分类的性能。鉴于故事的结构,主要和故事的初始部分表现出相对较好的性能。句子增量模型的结果表明,分别是印地语和泰卢文学故事中的前九个和七句,足以更好地分类故事。在大多数研究中,SVM型号以分类准确性的其他模型表现优于其他模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号