首页> 外文会议>2017 20th International Conference of Computer and Information Technology >A machine learning approach for stylometric analysis of Bangla literature
【24h】

A machine learning approach for stylometric analysis of Bangla literature

机译:孟加拉语文学风格分析的机器学习方法

获取原文
获取原文并翻译 | 示例

摘要

The term Stylogenetics refers to the eloquent analysis of authors literary corpora which are based on clustering. While writing, a writer focuses on some frequent things subconsciously. We1 focused on these things and tried to detect the affinity and divergence of the writing of different authors. In this approach, our proposal is regarding on some particular features to distinguish authors individuality who writes and establishes their own viewpoint on similar issues. Here we assembled Bengali Blogs scripted by twenty Bangladeshi authors of two different fields e.g. Political, Educational and analyzed the corpus. Via our methodology, we evaluated some features such as negative Word frequency in particular position, Rapidity of use of highest length word and sentence, Suffix Count, Use of particular Punctuation, Common Recognizable word frequency, Classification of Parts of speech, Numeric words frequency and so on. First, we trained the system using these features and then distinguished from random data sets using two machine learning approaches, Support Vector Machines (SVM) and Naive Bayes classifier. This proposal provides more accuracy than previously established works as all the collected corpus here, are of different writers writing, on the analogous field.
机译:术语“遗传学”是指基于聚类的作者文学语料库的雄辩分析。在写作时,作家会潜意识地关注一些频繁的事情。我们 1 着眼于这些事情,试图发现不同作者的写作之间的亲和力和分歧。在这种方法中,我们的建议是关于某些特定功能,以区分作者的个性,他们就相似的问题撰写并建立自己的观点。在这里,我们组装了孟加拉语博客,该博客由来自两个不同领域的20名孟加拉国作家编写,例如政治,教育和语料分析。通过我们的方法,我们评估了某些功能,例如特定位置的负词频,使用最大长度的单词和句子的速度,后缀计数,特殊标点的使用,常见的可识别词频,词性分类,数字词频和以此类推。首先,我们使用这些功能训练了系统,然后使用两种机器学习方法(支持向量机(SVM)和朴素贝叶斯分类器)将其与随机数据集区分开。这个提议比以前建立的著作提供了更高的准确性,因为这里收集的所有语料库都是由不同作者在类似领域撰写的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号