首页> 外文会议>International Conference of the CLEF Association >Simply the Best: Minimalist System Trumps Complex Models in Author Profiling
【24h】

Simply the Best: Minimalist System Trumps Complex Models in Author Profiling

机译:简单地是最好的:极简主义系统胜过作者分析中的复杂模型

获取原文

摘要

A simple linear SVM with word and character n-gram features and minimal parameter tuning can identify the gender and the language variety (for English, Spanish, Arabic and Portuguese) of Twitter users with very high accuracy. All our attempts at improving performance by including more data, smarter features, and employing more complex architectures plainly fail. In addition, we experiment with joint and multitask modelling, but find that they are clearly outperformed by single task models. Eventually, our simplest model was submitted to the PAN 2017 shared task on author profiling, obtaining an average accuracy of 0.86 on the test set, with performance on sub-tasks ranging from 0.68 to 0.98. These were the best results achieved at the competition overall. To allow lay people to easily use and see the value of machine learning for author profiling, we also built a web application on top our models.
机译:具有单词和字符N-GRAM功能的简单线性SVM和最小的参数调谐可以识别Twitter用户的性别和语言品种(英语,西班牙语,阿拉伯语和葡萄牙语),具有非常高的准确性。我们尝试通过包括更多数据,更智能的功能,并采用更复杂的架构来提高性能的尝试显然失败。此外,我们还使用关节和多任务建模,但发现它们显然是由单个任务模型显然表现出来的。最终,我们最简单的模型被提交给了作者分析上的Pan 2017共享任务,在测试集中获得了0.86的平均精度,具有0.68至0.98的子任务的性能。这些是整体竞争中取得的最佳成果。要允许人们轻松使用并查看机器学习的值为作者分析,我们还在我们的型号上建立了一个Web应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号