An Ⅰ-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

机译：基于Ⅰ载体的紧凑型多粒度主题空间的文本文档表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Various studies highlighted that topic-based approaches give a powerful spoken content representation of documents. Nonetheless, these documents may contain more than one main theme, and their automatic transcription inevitably contains errors. In this study, we propose an original and promising framework based on a compact representation of a textual document, to solve issues related to topic space granularity. Firstly, various topic spaces are estimated with different numbers of classes from a Latent Dirichlet Allocation. Then, This multiple topic space representation is compacted into an elementary segment, called c-vector, originally developed in the context of speaker recognition. Experiments are conducted on the DECODA corpus of conversations. Results show the effectiveness of the proposed multi-view compact representation paradigm. Our identification system reaches an accuracy of 85%, with a significant gain of 9 points compared to the baseline (best single topic space configuration).

机译：各种研究强调了基于主题的方法，给出了文件的强大口头内容表示。尽管如此，这些文件可能包含多个主题，其自动转录不可避免地包含错误。在这项研究中，我们提出了一种基于文本文件的紧凑型表示的原始和有前途的框架，以解决与主题空间粒度相关的问题。首先，使用不同的Dirichlet分配，估计各种主题空间以不同数量的类。然后，将该多个主题空间表示被压实成一个名为C-veach的基本段，最初在扬声器识别的上下文中开发。实验是在Decoda对话语料库上进行的。结果表明，所提出的多视图紧凑型表示范例的有效性。我们的识别系统达到85％的准确性，与基线相比具有9分的显着增益（最佳单个主题空间配置）。

著录项

来源
《Conference on empirical methods in natural language processing》|2014年||共12页
会议地点
作者
Mohamed Morchid; Mohamed Bouallegue; Richard Dufour; Georges Linares; Driss Matrouf; Renato de Mori;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Extracting topic-sensitive content from textual documents-A hybrid topic model approach [J] . Yan Liang, Ying Liu, Chong Chen, Engineering Applications of Artificial Intelligence . 2018,第APRa期

机译：从文本文档中提取主题敏感内容-一种混合主题模型方法
2. Approximate Representation of Textual Documents in the Concept Space [J] . J. Dob?a, B.D. Ba?ic Informatica: An International Journal of Computing and Informatics . 2007,第1期

机译：概念空间中文本文档的近似表示
3. Compact Multiview Representation of Documents Based on the Total Variability Space [J] . Morchid Mohamed, Bouallegue Mohamed, Dufour Richard, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第8期

机译：基于总可变空间的紧凑型多视图文档表示
4. An Ⅰ-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents [C] . Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Conference on empirical methods in natural language processing . 2014

机译：基于Ⅰ向量的文本文档紧凑多粒度主题空间表示方法
5. Phrase-based vector space model in document retrieval. [D] . Mao, Wenlei. 2003

机译：文档检索中基于短语的向量空间模型。
6. An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents [O] . Nikolaos Giarelis, Nikos Kanakaris, Nikos Karacapilidis -1

机译：一种基于图的创新方法可以从多个文本文档中选择特征
7. An I-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents [O] . Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, 2015

机译：基于I向量的紧致多粒度主题空间表示文本文档
8. Galerkin Approach to Define Measured Terrain Surfaces with Analytic Basis Vectors to Produce a Compact Representation [R] . Chemistruck, H. M., Ferris, J. B., Gorsich, D. J., 2010

机译：Galerkin方法用分析基矢量定义测量的地形表面以产生紧凑的表示

An Ⅰ-vector Based Approach to Compact Multi-Granularity Topic Spaces Representation of Textual Documents

摘要

著录项

相似文献

相关主题

期刊订阅