首页> 外文会议>International Conference on Text, Speech and Dialogue >Score Normalization Methods Applied to Topic Identification
【24h】

Score Normalization Methods Applied to Topic Identification

机译:分数标准化方法适用于主题识别

获取原文

摘要

Multi-label classification plays the key role in modern categorization systems. Its goal is to find a set of labels belonging to each data item. In the multi-label document classification unlike in the multi-class classification, where only the best topic is chosen, the classifier must decide if a document does or does not belong to each topic from the predefined topic set. We are using the generative classifier to tackle this task, but the problem with this approach is that the threshold for the positive classification must be set. This threshold can vary for each document depending on the content of the document (words used, length of the document,...). In this paper we use the Unconstrained Cohort Normalization, primary proposed for speaker identification/verification task, for robustly finding the threshold defining the boundary between the correc and the incorrect topics of a document. In our former experiments we have proposed a method for finding this threshold inspired by another normalization technique called World Model score normalization. Comparison of these normalization methods has shown that better results can be achieved from the Unconstrained Cohort Normalization.
机译:多标签分类在现代分类系统中起着关键作用。其目标是找到属于每个数据项的一组标签。在多级分类中的多级文档分类中,只有选择最佳主题,分类器必须决定文件是否属于或不属于预定义主题集的每个主题。我们正在使用生成分类器来解决此任务,但此方法的问题是必须设置正面分类的阈值。每个文档可以根据文档的内容(文件的长度,文档的长度)而异。在本文中,我们使用不受约束的队列标准化,提出了用于扬声器识别/验证任务的初级,用于鲁棒地找到定义界限之间的阈值和文档的错误主题。在我们的前实验中,我们提出了一种通过另一种归一化技术的发现激发了这种阈值,称为世界模型分数标准化。这些归一化方法的比较表明,可以从未约束的队列标准化实现更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号