首页> 外文会议>IEEE Conference on Computer Communications >Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach
【24h】

Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach

机译:迈向低延迟多视点360°交互式视频:一种多模式深度强化学习方法

获取原文

摘要

Recently, the fusion of 360° video and multi-viewpoint video, called multi-viewpoint (MVP) 360° interactive video, has emerged and created much more immersive and interactive user experience, but calls for a low latency solution to request the high-definition contents. Such viewing-related features as head movement have been recently studied, but several key issues still need to be addressed. On the viewer side, it is not clear how to effectively integrate different types of viewing-related features. At the session level, questions such as how to optimize the video quality under dynamic networking conditions and how to build an end-to-end mapping between these features and the quality selection remain to be answered. The solutions to these questions are further complicated given the many practical challenges, e.g., incomplete feature extraction and inaccurate prediction.This paper presents an architecture, called iView, to address the aforementioned issues in an MVP 360° interactive video scenario. To fully understand the viewing-related features and provide a one-step solution, we advocate multimodal learning and deep reinforcement learning in the design. iView intelligently determines video quality and reduces the latency without pre-programmed models or assumptions. We have evaluated iView with multiple real-world video and network datasets. The results showed that our solution effectively utilizes the features of video frames, networking throughput, head movements, and viewpoint selections, achieving at least 27.2%, 15.4%, and 2.8% improvements on the three video datasets, respectively, compared with several state-of-the-art methods.
机译:最近,融合了360°视频和多视点视频(称为多视点(MVP)360°交互式视频),并创造了更加身临其境的交互式用户体验,但是需要一种低延迟的解决方案来要求高定义内容。最近已经研究了诸如头部移动之类的与视觉相关的特征,但是仍然需要解决几个关键问题。在查看器端,尚不清楚如何有效地集成不同类型的与查看相关的功能。在会话级别,诸如如何在动态网络条件下优化视频质量以及如何在这些功能和质量选择之间建立端到端映射等问题仍有待回答。考虑到许多实际挑战,例如不完整的特征提取和不准确的预测,这些问题的解决方案变得更加复杂。本文提出了一种名为iView的体系结构,用于解决MVP 360°交互式视频场景中的上述问题。为了充分了解与视图相关的功能并提供一个一步的解决方案,我们在设计中主张多模式学习和深度强化学习。 iView无需预先编程的模型或假设即可智能地确定视频质量并减少延迟。我们已经用多个真实世界的视频和网络数据集评估了iView。结果表明,我们的解决方案有效利用了视频帧,网络吞吐量,头部移动和视点选择的功能,与多个状态数据集相比,三个视频数据集分别分别提高了至少27.2%,15.4%和2.8%。最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号