...
首页> 外文期刊>Expert Systems with Application >Generating web-based corpora for video transcripts categorization
【24h】

Generating web-based corpora for video transcripts categorization

机译:生成基于Web的语料库以进行视频笔录分类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper proposes the use of Internet as a rich source of information in order to generate learning corpora for video transcripts categorization systems. Our main goal in this work has been to study the behavior of different learning corpora generated from the Internet and analyze some of their features. Specifically, Wikipedia, Google and the blogosphere have been employed to generate these learning corpora, using the VideoCLEF 2008 track as the evaluation framework for the different experiments carried out. Based on this evaluation framework, we conclude that the proposed approach is a promising strategy for the video classification task using the transcripts of the videos. The different sizes of the corpora generated could lead to believe that better results are achieved when the corpus size is larger, but we demonstrate that this feature may not always be a reliable indicator of the behavior of the learning corpus. The obtained results show that the integration of knowledge from the blogosphere or Google allows generating more reliable corpora for this task than those based on Wikipedia.
机译:本文提出使用Internet作为丰富的信息源,以便为视频笔录分类系统生成学习语料库。我们这项工作的主要目标是研究从互联网生成的不同学习语料库的行为并分析其某些功能。具体而言,已将Wikipedia,Google和Blogosphere用于生成这些学习语料库,并使用VideoCLEF 2008跟踪作为进行的不同实验的评估框架。基于此评估框架,我们得出结论,对于使用视频转录本的视频分类任务,所提出的方法是一种很有前途的策略。生成的语料库的大小不同可能会导致人们相信,当语料库大小较大时,可以获得更好的结果,但是我们证明了此功能可能并不总是可靠地指示学习语料库的行为。获得的结果表明,与基于Wikipedia的知识库相比,来自Blogosphere或Google的知识集成可以为该任务生成更可靠的语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号