【24h】

What Grammar Tells About Gender and Age of Authors

机译:语法讲述了作者的性别和年龄

获取原文

摘要

The automatic classification of data has become a major research topic in the last years, and especially the analysis of text has gained interest due to the availability of huge amounts of online documents. In this paper, a novel style feature based on grammar syntax analysis is presented that can be used to automatically profile authors, i.e., to predict gender and age of the originator. Using full grammar trees of the sentences of a document, substructures of the trees are extracted by utilizing pq-grams. The mostly used patterns are then stored in a profile, which serve as input features for common machine learning algorithms. An extensive evaluation using a state-of-the-art test set containing thousands of English web blogs investigates on the optimal parameter and classifier configuration. Finally, promising results indicate that the proposed feature can be used as a significant characteristic to automatically predict the gender and age of authors.
机译:数据的自动分类已成为过去几年的主要研究课题,特别是由于巨额在线文件的可用性,文本的分析已经获得了利益。本文介绍了一种基于语法语法分析的新型风格特征,可用于自动自动配置作者,即预测发起人的性别和年龄。使用文档的句子的完整语法树,通过利用PQ-GRAM来提取树木的子结构。然后将主要使用的模式存储在配置文件中,该轮廓用作共同机器学习算法的输入特征。使用含有数千个英语Web博客的最先进的测试集进行了广泛的评估,调查了最佳参数和分类器配置。最后,有希望的结果表明,所提出的特征可以用作自动预测作者的性别和年龄的重要特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号