首页> 外文期刊>Circuits and Systems for Video Technology, IEEE Transactions on >Sparse Spatio-Temporal Representation With Adaptive Regularized Dictionary Learning for Low Bit-Rate Video Coding
【24h】

Sparse Spatio-Temporal Representation With Adaptive Regularized Dictionary Learning for Low Bit-Rate Video Coding

机译:低比特率视频编码的自适应正则字典学习的稀疏时空表示

获取原文
获取原文并翻译 | 示例

摘要

For promising vision-based video coding on low-quality data, this paper proposes a sparse spatio-temporal representation with adaptive regularized dictionary learning and develops a low bit-rate video coding scheme. In a reversed-complexity Wyner–Ziv coding manner, it selects a subset of key frames to code at original resolution, while the rest are down sampled and reconstructed by a sparse spatio-temporal approximation using key frames as a training dataset. Since primitive patches (geometry) are of low dimensionality and can be well learned from the primitive patches across frames in a scale space, a video frame is divided into three layers: a primitive layer, a nonprimitive coarse layer, and a nonprimitive smooth layer. The multiscale differential feature representations are invertible to facilitate reconstruction with dictionary learning, and the target is formulated as an optimization problem by constructing a sparse representation of 2-D patches and 3-D volumes over adaptive regularized dictionaries, a set of 2-D subdictionary pairs trained from primitive patches, and a 3-D dictionary trained from nonprimitive volumes. Specifically, the nonprimitive layer is constructed as volumes in to order keep it consistent along the motion trajectory, which enables sparse representations over a learned 3-D spatio-temporal dictionary. Through hierarchical bidirectional motion estimation and adaptive overlapped block motion compensation, the 3-D low-frequency and high-frequency dictionary pair is designed by the K-SVD algorithm to update the atoms for optimal sparse representation and convergence. In reconstruction, the lost high-frequency information of the down-sampled frames can be synthesized from the sparse spatio-temporal representation over the adaptive regularized dictionaries. Extensive experiments validate the compression efficiency of the proposed scheme versus H.264/AVC in terms of both objective and subjective comparisons.
机译:对于有希望的基于视觉的低质量数据视频编码,本文提出了一种具有自适应正则字典学习的稀疏时空表示方法,并提出了一种低比特率的视频编码方案。以反向复杂性Wyner-Ziv编码方式,它选择关键帧的子集以原始分辨率进行编码,而其余部分则通过稀疏时空近似(使用关键帧作为训练数据集)进行下采样和重构。由于基本面块(几何形状)的维数较低,并且可以从比例空间中跨帧的基本面块中很好地学习,因此视频帧分为三层:基本层,非基本粗糙层和非基本平滑层。多尺度差分特征表示可逆,以方便通过字典学习进行重构,并且通过在自适应正则字典(一组二维字典)上构建二维补丁和3-D体积的稀疏表示,将目标制定为优化问题。从原始补丁训练对,以及从非原始卷训练的3D字典。具体来说,将非本原层构造为体积,以使其沿运动轨迹保持一致,从而可以在学习的3D时空字典上进行稀疏表示。通过分层双向运动估计和自适应重叠块运动补偿,利用K-SVD算法设计了3-D低频和高频字典对,以更新原子以实现最佳的稀疏表示和收敛。在重建中,可以从自适应正则字典上的稀疏时空表示中合成下采样帧丢失的高频信息。在客观和主观比较方面,大量实验验证了所提方案相对于H.264 / AVC的压缩效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号