首页> 外文会议>Conference on empirical methods in natural language processing >An Ⅰ-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents
【24h】

An Ⅰ-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

机译:基于Ⅰ载体的紧凑型多粒度主题空间的文本文档表示

获取原文

摘要

Various studies highlighted that topic-based approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document, to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, This multiple topic space representation is compacted into an elementary segment, called c-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).
机译:各种研究强调了基于主题的方法,给出了文件的强大口头内容表示。尽管如此,这些文件可能包含多个主题,其自动转录不可避免地包含错误。在这项研究中,我们提出了一种基于文本文件的紧凑型表示的原始和有前途的框架,以解决与主题空间粒度相关的问题。首先,使用不同的Dirichlet分配,估计各种主题空间以不同数量的类。然后,将该多个主题空间表示被压实成一个名为C-veach的基本段,最初在扬声器识别的上下文中开发。实验是在Decoda对话语料库上进行的。结果表明,所提出的多视图紧凑型表示范例的有效性。我们的识别系统达到85%的准确性,与基线相比具有9分的显着增益(最佳单个主题空间配置)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号