首页> 外文期刊>Multimedia Tools and Applications >A framework for automatic semantic video annotation Utilizing similarity and commonsense knowledge bases
【24h】

A framework for automatic semantic video annotation Utilizing similarity and commonsense knowledge bases

机译:利用相似性和常识性知识库的自动语义视频注释框架

获取原文
获取原文并翻译 | 示例
           

摘要

The rapidly increasing quantity of publicly available videos has driven research into developing automatic tools for indexing, rating, searching and retrieval. Textual semantic representations, such as tagging, labelling and annotation, are often important factors in the process of indexing any video, because of their user-friendly way of representing the semantics appropriate for search and retrieval. Ideally, this annotation should be inspired by the human cognitive way of perceiving and of describing videos. The difference between the low-level visual contents and the corresponding human perception is referred to as the 'semantic gap'. Tackling this gap is even harder in the case of unconstrained videos, mainly due to the lack of any previous information about the analyzed video on the one hand, and the huge amount of generic knowledge required on the other. This paper introduces a framework for the Automatic Semantic Annotation of unconstrained videos. The proposed framework utilizes two non-domain-specific layers: low-level visual similarity matching, and an annotation analysis that employs commonsense knowledgebases. Commonsense ontology is created by incorporating multiple-structured semantic relationships. Experiments and black-box tests are carried out on standard video databases for action recognition and video information retrieval. White-box tests examine the performance of the individual intermediate layers of the framework, and the evaluation of the results and the statistical analysis show that integrating visual similarity matching with commonsense semantic relationships provides an effective approach to automated video annotation.
机译:公开视频数量的迅速增长推动了对开发用于索引,评级,搜索和检索的自动工具的研究。文本语义表示(例如标签,标签和注释)通常是索引任何视频过程中的重要因素,因为它们以用户友好的方式表示适合于搜索和检索的语义。理想情况下,此注释应受人类感知和描述视频的认知方式的启发。低级视觉内容和相应的人类感知之间的差异称为“语义鸿沟”。在视频不受限制的情况下,解决这一差距更加困难,这主要是由于一方面缺乏有关已分析视频的任何先前信息,另一方面则需要大量的通用知识。本文介绍了一种用于不受约束的视频的自动语义注释的框架。所提出的框架利用了两个非域特定的层:低级视觉相似性匹配,以及使用常识知识库的注释分析。常识本体是通过并入多种结构的语义关系而创建的。在标准视频数据库上进行了实验和黑盒测试,以进行动作识别和视频信息检索。白盒测试检查了框架各个中间层的性能,对结果的评估和统计分析表明,将视觉相似性匹配与常识语义关系集成在一起,可以提供一种有效的自动视频注释方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号