SuMACC Project's Corpus: A Topic-Based Query Extension Approach to Retrieve Multimedia Documents

机译：SUMACC项目的语料库：一种基于主题的查询扩展方法来检索多媒体文档

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The SuMACC project aims at automatically tracking new multimodal entities on Internet. The goal of the project is to propose robust multimedia methods that define relevant patterns allowing to automatically retrieve these entities. This paper describes the SuMACC corpus collected on video-sharing platforms using word-queries. Since concepts are limited to a single or few words, querying video-sharing platforms with the concept only can easily introduce irrelevant collected videos. In this paper, we propose to use an extended query obtained by mapping the initial concept into a topic space from a Latent Dirichlet Allocation (LDA) algorithm. This topic-based query extension approach allows to better retrieve videos related to the targeted concept. As a result, a corpus of 7,517 videos, extracted using the simple (i.e. concept only) and the extended queries, from 47 concepts, was obtained. Results show the effectiveness of the proposed thematic querying approach compared to the simple concept query in terms of relevance (+21%) and ambiguity (-4%). The annotation process as well as the corpus statistics are detailed in this paper.

机译：SUMACC项目旨在在Internet上自动跟踪新的多模式实体。该项目的目标是提出鲁棒的多媒体方法，这些方法定义了相关模式，允许自动检索这些实体。本文介绍了使用Word-Qualies在视频共享平台上收集的SUMACCCACC语料库。由于概念仅限于单个或少数单词，因此使用该概念查询视频共享平台只能轻易引入无关收集的视频。在本文中，我们建议使用通过将初始概念映射到来自潜在Dirichlet分配（LDA）算法的主题空间中获得的扩展查询。基于主题的查询扩展方法允许更好地检索与目标概念相关的视频。结果，获得了使用简单（即概念）和47个概念的简单（即概念）提取的7,517个视频的语料库。结果表明，在相关性（+ 21％）和歧义（-4％）中，建议主题查询方法的有效性。本文详述了注释过程以及语料库统计。

著录项

来源
《International Conference on Text, Speech and Dialogue》|2014年||共8页
会议地点
作者
Mohamed Morchid; Richard Dufour; Usman Niaz; Francis Bouvier; Clement de Groc; Claude de Loupy; Georges Linares; Bernard Merialdo; Bertrand Peralta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.1-53;
关键词
Multimedia corpus; Annotation; Latent Dirichlet Allocation; Topic modeling; Extended queries;

机译：多媒体语料库;注释;潜在的dirichlet分配;主题建模;扩展查询;

相似文献

外文文献
中文文献
专利

1. A Corpus Analysis Approach for Automatic Query Expansion and Its Extension to Multiple Databases [J] . Susan Gauch, Jianying Wang, Satya Mahesh Rachakonda ACM Transactions on Information Systems . 1999,第3期

机译：用于自动查询扩展的语料库分析方法及其扩展到多个数据库
2. Managing Word Mismatch Problems in Information Retrieval: A Topic-Based Query Expansion Approach [J] . CHIH-PING WEI, PAUL JEN-HWA HU, CHIA-HUNG TAI, Journal of management information systems . 2008,第3期

机译：管理信息检索中的单词不匹配问题：基于主题的查询扩展方法
3. An effective approach for semantic-based clustering and topic-based ranking of web documents [J] . Rajendra Kumar Roul International Journal of Data Science and Analytics . 2018,第4期

机译：Web文档基于语义的聚类和基于主题的排名的有效方法
4. SuMACC Project's Corpus: A Topic-Based Query Extension Approach to Retrieve Multimedia Documents [C] . Mohamed Morchid, Richard Dufour, Usman Niaz, International conference on text, speech and dialogue . 2014

机译：SuMACC项目的语料库：一种基于主题的查询扩展方法，用于检索多媒体文档
5. An ontology-driven concept-based information retrieveal approach for Web documents. [D] . Li, Zhan. 2010

机译：基于本体的基于概念的Web文档信息检索方法。
6. Synonym Topic Model and Predicate-Based Query Expansion for Retrieving Clinical Documents [O] . Qing T. Zeng, Doug Redd, Thomas Rindflesch, 2012

机译：用于检索临床文档的同义词主题模型和基于谓词的查询扩展
7. A Corpus Analysis Approach for Automatic Query Expansion and its Extension to Multiple Databases [O] . Susan Gauch, Jianying Wang, Satya Mahesh Rachakonda 1998

机译：自动查询扩展的语料库分析方法及其对多个数据库的扩展
8. Innovative approach to multimedia waste reduction: Measuring performance for environmental cleanup projects. [R] . Phifer, B. E., George, S. M. 1993

机译：多媒体废物减少的创新方法：衡量环境清理项目的绩效。

SuMACC Project's Corpus: A Topic-Based Query Extension Approach to Retrieve Multimedia Documents

摘要

著录项

相似文献

相关主题

期刊订阅