Academia.eduAcademia.edu

3D Video Compression

description31 papers
group401 followers
lightbulbAbout this topic
3D video compression is the process of reducing the data size of three-dimensional video content while maintaining visual quality. This technique employs algorithms to efficiently encode spatial and temporal information, enabling effective storage and transmission of 3D videos across various platforms and devices.
lightbulbAbout this topic
3D video compression is the process of reducing the data size of three-dimensional video content while maintaining visual quality. This technique employs algorithms to efficiently encode spatial and temporal information, enabling effective storage and transmission of 3D videos across various platforms and devices.

Key research themes

1. How can video coding standards be extended to efficiently compress 3D and multiview video formats?

This research area focuses on the development and standardization of video coding extensions to support multiview and 3D video, particularly integrating depth information alongside texture to enable high-quality stereoscopic and autostereoscopic displays. Efficient exploitation of inter-view redundancy and depth-texture correlations is vital to achieve substantial bitrate savings for immersive video applications.

Key finding: Introduces MV-HEVC and 3D-HEVC as extensions of the HEVC standard, where MV-HEVC reuses single-layer decoders and exploits inter-view references for 20-30% bitrate savings over simulcast. 3D-HEVC further improves coding... Read more
Key finding: Presents HEVC-FISMVC, a single-layer HEVC-based interleaved frame coding approach that maximizes temporal, inter-view, and combined correlations by treating stereo/multiview video as an interleaved monoscopic stream,... Read more
Key finding: Analyzes how varying inter-camera angles impact coding efficiency of an H.264/AVC multi-view codec using different three-camera combinations from the Breakdancers dataset. Results indicate that multi-view coding outperforms... Read more
Key finding: Demonstrates exploiting human visual system characteristics to enable asymmetric compression of stereo views using H.264, showing that unequal quality compression guided by eye dominance can maintain perceptual 3D quality... Read more

2. What methods are effective for compressing stereoscopic and multi-picture object (MPO) images leveraging inter-view redundancy?

This theme explores approaches for stereoscopic image compression that exploit the high redundancy between left and right views, focusing on MPO formats and adaptive compression strategies that balance storage savings and visual quality. Evaluations consider traditional disparity map-based methods as well as adaptive independent coding enhanced by inter-view information sharing.

Key finding: Evaluates adaptive stereoscopic image compression where the two views are independently compressed at different quality factors, followed by an enhancement step exploiting inter-view redundancy. Experimental results highlight... Read more
Key finding: Through subjective testing, shows that resolution-asymmetric stereoscopic video where one view is downsampled by 1/2 horizontally and vertically yields perceptual quality comparable to symmetric full-resolution stereo at... Read more
Key finding: Develops a 3D/stereoscopic image database, degradation software, and conducts psychophysical experiments for quality assessment specifically tailored for stereo images. The study reveals that conventional 2D image quality... Read more

3. How can compression of 3D point clouds and 3D meshes be optimized for immersive and real-time applications?

This theme covers the burgeoning research into compressing 3D spatial data representations such as point clouds and meshes, critical for augmented reality, tele-immersion, and interactive 3D applications. It emphasizes techniques addressing the high data volume while maintaining visual fidelity and the perceptual quality of reconstructed content, and the challenges posed by real-time processing and noisy reconstructed data.

Key finding: Provides a structured survey categorizing 3D point cloud compression methods into 1D traversal, 2D projection-based, and direct 3D approaches. Highlights the challenge of compressing unstructured point clouds carrying... Read more
Key finding: Presents subjective studies evaluating compression artifacts on 3D meshes generated from real-time RGB-D sensors versus high-quality scans. Results indicate that irregular and noisy meshes from real-time reconstruction are... Read more

4. Can emerging deep learning and diffusion-based methods improve video compression beyond traditional codecs?

This research area investigates the integration of deep learning techniques, including neural networks and diffusion models, in video compression frameworks aiming at end-to-end optimized, perceptually enhanced coding. These methods promise to overcome limitations of classical codecs by leveraging learned priors and generative modeling to better exploit spatio-temporal redundancies and enable novel functionalities like view synthesis with reduced bitrates.

Key finding: Introduces OpenDMC, the first open-source cross-platform deep learning video compression library, encompassing classical and recent end-to-end neural methods such as DVC. Benchmarks report promising rate-distortion efficiency... Read more
Key finding: Proposes a novel diffusion-based video compression scheme leveraging denoising diffusion generative models as powerful priors, combined with low-quality encoded guidance data via finetuned low-rank adaptations (e.g., LoRA).... Read more
Key finding: Develops a transform-domain distributed residual video coding architecture implementing a Quantized Transform Decision Mode (QUAM) to skip zero transform blocks and thus reduce the overall channel coding and decoding... Read more

All papers in 3D Video Compression

This document presents a novel diffusion-based video compression technique. We leverage the inherent expressiveness, photorealism and 3D awareness of denoising diffusion generative AI models as a powerful general-purpose prior that only... more
Depth map compression is important for efficient network transmission of 3D visual data in texture-plus-depth format, where the observer can synthesize an image of a freely chosen viewpoint via depth-image-based rendering (DIBR) using... more
There has been increasing demand for multiview video transmission over band limited channel over past years and various techniques have been proposed to fulfil this need. In this paper, a High Efficiency Video Codec (HEVC) based spatial... more
In this paper, we propose a method to extract depth from motion, texture and intensity. We first analyze the depth map to extract a set of depth cues. Then, based on these depth cues, we process the colored reference video, using texture,... more
Light field imaging is characterized by capturing brightness, color, and directional information of light rays in a scene. This leads to image representations with huge amount of data that require efficient coding schemes. In this paper,... more
Light field data records the amount of light at multiple points in space, captured e.g. by an array of cameras or by a light-field camera that uses microlenses. Since the storage and transmission requirements for such data are tremendous,... more
The aim of the Leeds Beckett Repository is to provide open access to our research, as required by funder policies and permitted by publishers and copyright law. The Leeds Beckett repository holds a wide range of publications, each of... more
3D video coding includes the use of multiple color views and depth maps associated to each view. An adequate coding of depth maps should be adapted to the characteristics of depth maps: smooth regions and sharp edges. In this paper a... more
Public transportation is one of the things that is often used by everyone. One type of public transportation that is often used in Indonesia is the Bus Rapid Transit (BRT) or commonly known as the busway. The city of Padang, West Sumatra... more
The state-of-the-art High Efficiency Video Coding (HEVC) standard adopts a hierarchical coding structure to improve its coding efficiency. This allows for the Quantization Parameter Cascading (QPC) scheme that assigns Quantization... more
Public transportation is one of the things that is often used by everyone. One type of public transportation that is often used in Indonesia is the Bus Rapid Transit (BRT) or commonly known as the busway. The city of Padang, West Sumatra... more
Light field imaging is characterized by capturing brightness, color, and directional information of light rays in a scene. This leads to image representations with huge amount of data that require efficient coding schemes. In this paper,... more
The acquisition of the spatial and angular information of a scene using light field (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth.... more
In a traditional multi-view image generation algorithm, partial image information might be lost at the pixel mapping step during the 3D image acquisition. A lower hardware cost and shorter operation time can be realized if an effective... more
View synthesis prediction (VSP) is a coding mode that predicts video blocks from synthesised frames. It is particularly useful in a multi-camera setup with large inter-camera distances. Adding a VSP-based SKIP mode to a standard Multiview... more
The standard HEVC codec and its extension for coding multiview videos, known as MV-HEVC, have proven to deliver improved visual quality compared to its predecessor, H.264/MPEG-4 AVC’s multiview extension, H.264-MVC, for the same frame... more
Light Field (LF) offers unique advantages such as post-capture refocusing and depth estimation, but low-light conditions severely limit these capabilities. To restore low-light LFs we should harness the geometric cues present in different... more
Multi-view video coding technique has been widely applied to 3D stereoscopic display. This paper proposes a rate control algorithm for multi-view video codec which is a tradeoff method between the data compression ratio and picture... more
Error resilient encoding in video communication is becoming increasingly important due to data transmission over unreliable channels. In this paper, we propose a new power-aware error resilient coding scheme based on network error... more
Holoscopic imaging is a prospective acquisition and display solution for providing true 3D content and fatigue-free 3D visualization. However, efficient coding schemes for this particular type of content are needed to enable proper... more
This paper proposes a two-stage high order intra block prediction method for light field image coding. This method exploits the spatial redundancy in lenslet light field images by predicting each image block, through a geometric... more
Light field imaging is a promising new technology that allows the user not only to change the focus and perspective after taking a picture, as well as to generate 3D content, among other applications. However, light field images are... more
During traditional road surveys, inspectors capture images of pavement surface using cameras that produce 2D images, which can then be automatically processed to get a road surface condition assessment. This paper proposes a novel crack... more
DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page... more
Conventional depth video compression uses video codecs designed for color images. Given the performance of current encoding standards, this solution seems efficient. However, such an approach suffers from many issues stemming from... more
This paper presents the optimal mode selection of disparity-compensated (DC) wavelet lifting in a multi-view image coding framework. The optimal mode is a combination of macroblock (MB) coding and block partitioning mode of DC wavelet... more
Today, stereoscopic 3D (S3D) cinema is already mainstream, and almost all new display devices for the home support S3D content. S3D distribution infrastructure to the home is partly already established in form of 3D Blu-ray discs, video... more
In this paper we present two methods of depth extraction for 2D-to-3D video conversion. One for a scene captured with a static camera and other for the case of a moving camera, both using information from the motion present on the scene.... more
Motion estimation (ME) is one of the element keys in video compression that takes up to 60% in processing time. Block matching algorithm (BMA) is a technique that is used to reduce the computational complexity of ME algorithm due to its... more
Several real-time visual monitoring applications such as surveillance, mental state monitoring, driver drowsiness and patient care, require equipping high-quality cameras with wireless sensors to form visual sensors and this creates an... more
JPEG Pleno is an upcoming standard from the ISO/IEC JTC 1/SC 29/WG 1 (JPEG) Committee. It aims to provide a standard framework for coding new imaging modalities derived from representations inspired by the plenoptic function. The image... more
This paper describes a light field compression scheme based on a novel homography-based low rank approximation method called HLRA. The HLRA method jointly searches for the set of homographies best aligning the light field views and for... more
This paper presents a predictive-coding algorithm for the compression of multiple depth-sequences obtained from a multi-camera acquisition setup. The proposed depth-prediction algorithm works by synthesizing a virtual depth-image that... more
We investigate the coding of multiview images obtained from a set of multiple cameras. To exploit the interview correlation, two viewprediction tools have been implemented and used in parallel: a blockbased motion compensation scheme and... more
DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page... more
Light field imaging is characterized by capturing brightness, color, and directional information of light rays in a scene. This leads to image representations with huge amount of data that require efficient coding schemes. In this paper,... more
In recent years, light field imaging has attracted the attention of the academic and industrial communities thanks to its enhanced rendering capabilities that allow to visualise contents in a more immersive and interactive way. However,... more
Pattern-based video coding (PVC) has already established its superiority over H.264 in low bit rate areas because of an extra pattern-mode to segment out the arbitrary shape of the moving region in macroblock. To determine the... more
Pattern-based video coding representing moving regions in macroblock has very good potential for improved coding efficiency over existing standard H.264/AVC in the very low bit-rate range. However, the coding efficiency diminishes... more
The selection of an optimal regular-shaped pattern set for very low bit-rate video coding, focusing on moving regions has been the objective of much recent research in order to try and improve bit-rate eficieciency. Selecting the optimal... more
A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions.
Pattern-based video coding (PVC) has already established its superiority over H.264 in low bit rate areas because of an extra pattern-mode to segment out the arbitrary shape of the moving region in macroblock. To determine the... more
Very low bit-rate video coding algorithms using predefined regular-shaped patterns to segment out moving objects at macroblock level have exhibited good potential for improved coding efficiency when embedded in the H.264 standard as an... more
Algorithms using content-based patterns to segment moving regions at the macroblock (MB) level have exhibited good potential for improved coding efficiency when embedded into the H.264 standard as an extra mode. The content-based pattern... more
In the context of very low hit-rate video coding, pre-defined fixed pattern representations of moving regions in blocked-based motion estimation and compensation has become increasingly attractive over H.264 as the former represents an MB... more
Motion estimation (ME) is one of the element keys in video compression that takes up to 60% in processing time. Block matching algorithm (BMA) is a technique that is used to reduce the computational complexity of ME algorithm due to its... more
Motion estimation (ME) is one of the element keys in video compression that takes up to 60% in processing time. Block matching algorithm (BMA) is a technique that is used to reduce the computational complexity of ME algorithm due to its... more
Light field data records the amount of light at multiple points in space, captured e.g. by an array of cameras or by a light-field camera that uses microlenses. Since the storage and transmission requirements for such data are tremendous,... more
Depth image based rendering techniques for multiview applications have been recently introduced for efficient view generation at arbitrary camera positions. Encoding rate control has thus to consider both texture and depth data. Due to... more
In this paper, we present a modified interview prediction multiview video coding (MVC) scheme form the perspective of viewer interactivity. This latter requires a high transmission bite-rate, when a viewer requests some view(s).... more
Download research papers for free!