首页> 外文会议>International Conference on Statistical Language and Speech Processing >Movie Genre Detection Using Topological Data Analysis
【24h】

Movie Genre Detection Using Topological Data Analysis

机译:使用拓扑数据分析的电影类型检测

获取原文

摘要

We show that by applying discourse features derived through topological data analysis (TDA), namely homological persistence, we can improve classification results on the task of movie genre detection, including identification of overlapping movie genres. On the IMDB dataset we improve prior art results, namely we increase the Jaccard score by 4.7% over a recent results by Hoang. We also significantly improve the F-score (by over 15%) and slightly improve the hit rate (by 0.5%, ibid.). We see our contribution as threefold: (a) for general audience of computational linguists, we want to increase their awareness about topology as a possible source of semantic features; (b) for researchers using machine learning for NLP tasks, we want to propose the use of topological features when the number of training examples is small; and (c) for those already aware of the existence of computational topology, we see this work as contributing to the discussion about the value of topology for NLP, in view of mixed results reported by others.
机译:我们表明,通过应用通过拓扑数据分析(TDA)来源的话语功能,即同源持久性,我们可以改善电影类型检测任务的分类结果,包括识别重叠的电影类型。在IMDB数据集上,我们改善了现有技术的结果,即我们将Jaccard得分增加了4.7%的Hoang。我们还显着提高了F评分(超过15%)并略微提高了击中率(率为0.5%,同上)。我们认为我们为三倍的贡献:(a)对于计算语言学家的一般观众,我们希望提高对拓扑的意识,作为可能的语义特征来源; (b)对于使用机器学习的研究人员进行NLP任务,我们希望在训练示例的数量小时建议使用拓扑功能; (c)对于已经了解计算拓扑存在的人,我们认为这项工作是有助于讨论NLP的拓扑价值,鉴于其他人报告的混合结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号