首页> 外文会议>EPIA Conference on Artificial Intelligence >Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses
【24h】

Exploring Textual Features for Multi-label Classification of Portuguese Film Synopses

机译:探索葡萄牙电影概要的多标签分类的文本特征

获取原文

摘要

The multi-label classification of film genres by using features extracted from their synopses has recently gained some attention from the scientific community, however, the number of studies is still limited. These studies are even scarcer for languages other than English. In this work we present the P-TMDb dataset, which contains 13,394 Portuguese film synopses, and explore the film genre classification by experimenting with nine different groups of textual features and four multi-label algorithms. As our dataset is unbalanced, we also conducted experiments with an oversampled version of the dataset. The best result obtained for the original dataset was achieved by a TF-IDF based classifier, presenting an average Fl score of 0.478, while the best result for the oversampled dataset was achieved by a combination of several feature groups and presented an average Fl score of 0.611.
机译:使用从他们的联系中提取的特征的薄膜类型的多标签分类最近从科学界获得了一些关注,然而,研究的数量仍然有限。这些研究甚至是英语以外的语言的稀缺。在这项工作中,我们介绍了P-TMDB数据集,其中包含13,394部葡萄牙电影组件,并通过尝试九个不同的文本特征和四个多标签算法来探索电影类型分类。随着我们的数据集不平衡,我们还通过数据集的过采样版本进行了实验。由基于TF-IDF基于TF-IDF的分类器实现的最佳结果,呈现平均流量为0.478,而过采样数据集的最佳结果是通过多个特征组的组合实现,并呈现平均流量0.611。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号