This thesis explores the motion anchoring strategies, which represent a fundamental change to the way motion is employed in a video compression system—from a “prediction-centric” point of view to a “physical” representation of the underlying motion of the scene. The proposed “reference-based” motion anchorings can support computationally efficient, high-quality temporal motion inference, which requires half as many coded motion fields as conventional codecs. This raises the prospect of achieving lower motion bitrates than the most advanced conventional techniques, while providing more temporally consistent and meaningful motion. The availability of temporally consistent motion can facilitate the efficient deployment of highly scalable video compression systems based on temporal lifting, where the feedback loop used in traditional codecs is replaced by a feedforward transform.The novel motion anchoring paradigm proposed in this thesis is well adapted to seamlessly supporting “features”beyond compressibility, including high scalability, accessibility, and “intrinsic” frame upsampling. These features are becoming ever more relevant as the way video is consumed continues to shift from the traditional broadcast scenario with predefined network and decoder constraints to interactive browsing of video content via heterogeneous networks.
A key element of any modern video codec is the efficient exploitation of temporal redundancy via motion-compensated prediction. In this book, a novel paradigm of representing and employing motion information in a video compression system is described that has several advantages over existing approaches. Traditionally, motion is estimated, modelled, and coded as a vector field at the target frame it predicts. While this “prediction-centric” approach is convenient, the fact that the motion is “attached” to a specific target frame implies that it cannot easily be re-purposed to predict or synthesize other frames, which severely hampers temporal scalability.
In light of this, the present book explores the possibility of anchoring motion at reference frames instead. Key to the success of the proposed “reference-based” anchoring schemes is high quality motion inference, which is enabled by the use of a more “physical” motion representation than the traditionally employed “block” motion fields. The resulting compression system can support computationally efficient, high-quality temporal motion inference, which requires half as many coded motion fields as conventional codecs. Furthermore, “features” beyond compressibility — including high scalability, accessibility, and “intrinsic” framerate upsampling — can be seamlessly supported. These features are becoming ever more relevant as the way video is consumed continues shifting from the traditional broadcast scenario to interactive browsing of video content over heterogeneous networks.
This book is of interest to researchers and professionals working in multimedia signal processing, in particular those who are interested in next-generation video compression. Two comprehensive background chapters on scalable video compression and temporal frame interpolation make the book accessible for students and newcomers to the field.
Dominic Rüfenacht
Wavelet-based Highly Scalable Video Compression (WSVC) Temporal Frame Interpolation (TFI) Bidirectional Hierarchical Anchoring (BIHA) Forward-Only Hierarchical Anchoring (FOHA) Base-Anchored Motion (BAM) Selective Wavelet Coefficient Attenuation (SWCA) Optical Blur Synthesis Texture Optimizations Disocclusion and Folding Likelihood Map (DFLM) Scalable Image