...
首页> 外文期刊>Procedia Computer Science >Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features
【24h】

Machine Learning Models of Text Categorization by Author Gender Using Topic-independent Features

机译:作者性别使用主题无关功能按文本分类的机器学习模型

获取原文
           

摘要

In the present article, we address the problem of automatic text classification according to the author's gender. We used a preexisting corpus of Russian-language texts RusPersonality labeled with information on their authors (gender, age, psychological testing and so on). We performed the comparative study of machine learning techniques for gender attribution in Russian-language texts after deliberately removing gender bias in topics and genre. The obtained models of classifying Russian texts by their authors’ gender demonstrate accuracy close to the state-of-the-art and even higher (up to 0.86 +/-0.03 in Accuracy, 86% in F1-score).
机译:在本文中,我们解决了根据作者性别自动分类文本的问题。我们使用了俄罗斯语文本RusPersonality的现有语料库,其中标有作者的信息(性别,年龄,心理测验等)。在故意消除主题和体裁中的性别偏见之后,我们对俄语文本中的性别归因进行了机器学习技术的比较研究。所获得的按作者性别对俄文文本进行分类的模型显示出的准确性接近最新水平,甚至更高(准确性高达0.86 +/- 0.03,F1得分高达86%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号