首页> 外文期刊>ACM transactions on Asian language information processing >A Supervised Learning Approach for Authorship Attribution of Bengali Literary Texts
【24h】

A Supervised Learning Approach for Authorship Attribution of Bengali Literary Texts

机译:孟加拉语文学文本作者身份归属的有监督的学习方法

获取原文
获取原文并翻译 | 示例

摘要

Authorship Attribution is a long-standing problem in Natural Language Processing. Several statistical and computational methods have been used to find a solution to this problem. In this article, we have proposed methods to deal with the authorship attribution problem in Bengali. More specifically, we proposed a supervised framework consisting of lexical and shallow features and investigated the possibility of using topic-modeling-inspired features, to classify documents according to their authors. We have created a corpus from nearly all the literary works of three eminent Bengali authors, consisting of 3,000 disjoint samples. Our models showed better performance than the state-of-the-art, with more than 98% test accuracy for the shallow features and 100% test accuracy for the topic-based features. Further experiments with GloVe vectors [Pennington et al. 2014] showed comparable results, but flexible patterns based on content words and high-frequency words [Schwartz et al. 2013] failed to perform as well as expected.
机译:在自然语言处理中,作者身份归属是一个长期存在的问题。已经使用几种统计和计算方法来找到该问题的解决方案。在本文中,我们提出了解决孟加拉语作者身份归属问题的方法。更具体地说,我们提出了一个由词法和浅层特征组成的监督框架,并研究了使用主题建模启发的特征根据其作者对文档进行分类的可能性。我们从三位孟加拉著名作家的几乎所有文学作品中创建了一个语料库,其中包括3,000个相互分离的样本。我们的模型表现出了比最新技术更好的性能,浅层特征的测试准确度超过98%,基于主题的特征的测试准确度超过100%。 GloVe载体的进一步实验[Pennington等。 [2014年]显示出可比的结果,但是基于内容词和高频词的灵活模式[Schwartz等。 2013]的表现未能达到预期。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号