腾讯多媒体实验室2021年度部分发表论文解读

腾讯多媒体实验室

发布于 2022-02-15 13:10:14

1.3K0

发布于 2022-02-15 13:10:14

腾讯多媒体实验室积极投入前沿技术研究，取得显著成果。2021年，腾讯多媒体实验室在国际知名期刊(IEEE Trans. on CSVT, Proceedings of the IEEE等)及领域旗舰会议(ICIP, ICME, PCS, VCIP等)上发表论文数十篇，主要包括视频编解码、沉浸式媒体、多媒体AI等研究方向。本文将对部分论文进行解读。

A Real-Time H.266/VVC Software Decoder 一个实时 H.266/VVC 软件解码器 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9428470

Abstract

The new Versatile Video Coding Standard (VVC) was finalized in July 2020. This new international standard is able to provide a higher compression efficiency by up to 50% for the same subjective quality compared to its predecessor HEVC but at the cost of an increased computational load. This paper investigates the complexity of VVC decoder processing blocks and presents a highly optimized decoder implementation that can achieve 4K 60fps VVC real-time decoding on an x86 based CPU using SIMD instruction extensions of the processor and additional parallel processing including data and task-level parallelism.

A Video Dataset for Learning-based Visual Data Compression and Analysis 用于基于学习的视觉数据压缩和分析的视频数据集 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9675343

Abstract

Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent Video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.

Adaptive Geometry Partition for Point Cloud Compression 点云压缩的自适应几何分块技术 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9503374

Abstract

Octree (OT) geometry partitioning has been acknowledged as an efficient representation in state-of-the-art point cloud compression (PCC) schemes. In this work, an adaptive geometry partition and coding scheme is proposed to improve the OT based coding framework. First, quad-tree (QT) and binary-tree (BT) partitions are introduced as alternative geometry partition modes for the first time under the context of OT-based point cloud compression. The adaptive geometry partition scheme enables flexible three-dimensional (3D) space representations and higher coding efficiency. However, exhaustive searching for the optimal partition from all possible combinations of OT, QT and BT is impractical because the entire search space could be huge. Therefore, two hyper-parameters are introduced to specify the conditions on which QT and BT partitions will be applied. Once the two parameters are determined, the partition mode can be derived according to the geometry shape of current coding node. To investigate the impact of different partition combinations on the coding gains, we conduct thorough mathematical and experimental analyses. Based on the analyses, an adaptive parameter selection scheme is presented to optimize the coding efficiency adaptively, where multi-resolution features are extracted from the partition pyramid and a decision tree model is trained for the optimal hyper-parameters. The proposed adaptive geometry partition scheme has shown significant coding gains, and it has been adopted in the state-of-the-art MPEG Geometry based PCC (G-PCC) standard. For the sparser point clouds, the bit savings are up to 10.8% and 3.5% for lossy and lossless geometry coding without significant complexity increment.

An Optimized H.266/VVC Software Decoder On Mobile Platform 一个针对移动平台优化的H.266/VVC软件解码器 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9477484

Abstract

As the successor of H.265/HEVC, the new versatile video coding standard (H.266/VVC) can provide up to 50% bitrate saving with the same subjective quality, at the cost of increased decoding complexity. To accelerate the application of the new coding standard, a real-time H.266/VVC software decoder that can support various platforms is implemented, where SIMD technologies, parallelism optimization, and the acceleration strategies based on the characteristics of each coding tool are applied. As the mobile devices have become an essential carrier for video services nowadays, the mentioned optimization efforts are not only implemented for the x86 platform, but more importantly utilized to highly optimize the decoding performance on the ARM platform in this work. The experimental results show that when running on the Apple A14 SoC (iPhone 12pro), the average single-thread decoding speed of the present implementation can achieve 53fps (RA and LB) for full HD (1080p) bitstreams generated by VTM-11.0 reference software using 8bit Common Test Conditions (CTC). When multi-threading is enabled, an average of 32 fps (RA) can be achieved when decoding the 4K bitstreams.

Context-Adaptive Secondary Transform For Video Coding 视频编码中的基于上下文自适应的二次变换 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9506126

Abstract

It is well-known that non-separable transforms can efficiently decorrelate arbitrarily directed textures that are often present in image and video content. Due to the computational complexity involved, it is usually applied as a secondary transform operating on low frequency primary transform coefficients. In order to represent a variety of arbitrary directional textures in natural images /videos, it is ideal to have sufficient coverage of secondary transform kernels for the codec to choose from. However, this may lead to increased signaling cost and encoder complexity. This paper proposes a context-adaptive secondary transform (CAST) kernel selection approach to enable the usage of more secondary transform kernels with no signaling cost increase and minimal encoder and decoder complexity increase. The proposed approach uses the variance of the top row and left column of reconstructed pixels adjacent to the transform block, if available, as a context for selecting the set of transform kernels. Experimental results show that, compared to libaom, the proposed algorithm achieves a luma BD-rate reduction of 2.17% and 3.11% for All Intra coding using PSNR and SSIM quality metrics, respectively.

Cross-Component Sample Offset for Image and Video Coding 图像和视频编码中的跨分量样本偏移补偿技术 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9675355

Abstract

Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a non-linear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.

Enhanced Implicit Selection of Transform Skip in AVS3 AVS3 标准中变换跳过技术的增强隐式选择 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9428251

Abstract

As the demand of remote desktop sharing grows, Screen Content Coding (SCC) is rapidly drawing attention. Several efficient coding tools have been adopted into AVS3 for improving the performance of SCC. In this paper, enhanced implicit selection of transform skip (EISTS) is proposed to improve the performance of the transform module. When transform skip mode is introduced, its usage will be indicated in an implicit manner, instead of coding one explicit flag in the bitstream. More specifically, the transform type (whether it is transform skip or regular transform) can be derived by checking the parity of the number of even quantized coefficients at the decoder side. Correspondingly, one coefficient may need to be adjusted at the encoder side to match the assumed transform selection. Verified by experiments, EISTS can improve the performance of SCC efficiently. Therefore, this technology has been adopted into the AVS3 standard.

Fast DST-VIIDCT-VIII With Dual Implementation Support for Versatile Video Coding VVC视频标准中具有双实现支持的快速 DST-VII/DCT-VIII实现方法 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9018012

Abstract

The Joint Video Exploration Team (JVET) recently launched the standardization of the next-generation video coding named Versatile Video Coding (VVC) with the inherited technical framework from its predecessor High-Efficiency Video Coding (HEVC). The simplified Enhanced Multiple Transform (EMT) has been adopted as the primary residual coding transform solution, termed Multiple Transform Selection (MTS). In MTS, only the transform set consisting of DST-VII and DCT-VIII remains, excluding the other transform sets and the dependency on intra prediction modes. Significant coding gains are achieved by introducing new DST/DCT transforms, but the full matrix implementation is relatively costly compared to partial butterfly in terms of both software run-time and operation counts. In this work, we exploit the inherent features existing in DST-VII and DCT-VIII. Instead of repeating the element-wise additions and multiplications in full matrix operation, these features can be leveraged to achieve more efficient implementations which only use partial elements to derive the identical results. Existing transform matrices are further tuned to utilize these (anti-)symmetric features. A partial butterfly-type fast algorithm with dual-implementation support is proposed for DST-VII/DCT-VIII transform in VVC. Complexity analysis including operation counts and software run-time are conducted to validate the effectiveness. In addition, we prove the features are perfectly supported by theory. The proposed fast methods achieve noticeable software run-time savings without compromising on coding performance by comparing with the VVC Test Model VTM-3.0. It is shown that under Common Test Condition (CTC) with inter MTS enabled, an average of 9%, 0%, and 3% decoding time savings are achieved for All Intra (AI), Random Access (RA) and Low Delay B (LDB), respectively. Under low QP test condition with inter MTS enabled, the proposed fast methods achieve 1%, 2% and 4% decoding time savings on average for AI, RA, and LDB, respectively.

Improved Intra Mode Coding Beyond AV1 基于AV1标准拓展的帧内模式编码技术改进 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9413420

Abstract

In AOMedia Video 1 (AV1), directional intra prediction modes are applied to model local texture patterns that present certain directionality. Each intra prediction direction is represented with a nominal mode index and a delta angle. The delta angle is entropy coded using shared context between luma and chroma, and the context is derived using the associated nominal mode. In this paper, two methods are proposed to further reduce the signaling cost of delta angles: cross-component delta angle coding, and context-adaptive delta angle coding, whereby the cross-component and spatial correlation of the delta angles are explored, respectively. The proposed methods were implemented on top of a recent version of libaom. Experimental results show that the proposed cross-component delta angle coding achieved average 0.4% BD-rate reduction with 4% encoding time saving over all intra configurations. By combining both methods, an average 1.2% BD-rate reduction is achieved.

Low Complexity Implementation of Intra String Copy in AVS3 AVS3中帧内串匹配技术的低复杂度实现 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9456015

Abstract

Screen content coding has attracted extensive interests in both industry and academia with the popularity of screen content applications. In this paper, a low complexity implementation of intra string copy mode is proposed for screen content coding. The proposed intra string copy mode has been recently included in the latest AVS3 video coding standard. This paper presents the design of the proposed intra string copy mode, where the main features designed for hardware friendly implementation are introduced and discussed. Simulation results show that the proposed intra string copy significantly improves the compression performance for screen content materials, while the implementation complexity is maintained at an acceptable level.

Multicomponent Secondary Transform 多分量二次变换 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9675447

Abstract

The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. In this regard, a frequency-domain coding tool, which is designed to leverage the cross-component correlation existing between collocated chroma blocks, is explored in this paper. The tool, henceforth known as multi-component secondary transform (MCST), is implemented as a low complexity secondary transform with primary transform coefficients of multiple color components as input. The proposed tool is implemented and tested on top of libaom. Experimental results show that, compared to libaom, the proposed method achieves an average 0.34% to 0.44% overall coding efficiency for All Intra (AI) coding configuration for a wide range of video content.

Overview of Screen Content Coding in Recently Developed Video Coding Standards 最近制定的视频编码标准中的屏幕内容编码技术概述 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9371731

Abstract

In recent years, computer-generated texts, graphics, and animations have drawn more attention than ever. These types of media, also known as screen content, have become increasingly popular due to their widespread applications. To address the need for efficient coding of such content, several coding tools have been developed and have made great advances in terms of coding efficiency. The inclusion of screen content coding features in some recently developed video coding standards (namely, HEVC SCC, VVC, AVS3, AV1 and EVC) demonstrates the importance of supporting such features. This paper provides an overview and comparative study of screen content coding technologies, as well as discussions on the performance and complexity of the tools developed in these standards.

PnG: Micro-structured Prune-and-Grow Networks for Flexible Image Restoration PnG：用于灵活图像恢复的微结构 Prune-and-Grow 网络 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9523064

Abstract

This paper addresses one major issue of DNN-based image restoration: the difficulty of using one model to fit multiple reconstruction requirements, such as supporting different compression ratios in neural image compression (NIC) or different zooming scales in single image super-resolution (SISR). Instead of training an independent model instance for each requirement as an individual task, we develop a practical solution that uses one model instance to support multiple requirements. We propose a general multi-task learning framework based on a novel prune-and-grow (PnG) process, where each task corresponds to each of the requirements. Different from traditional multi-task networks that use fully shared or task-specific layers, we enable in-layer partial parameter sharing to obtain both common and task-specific features at various abstraction levels. This encourages adequate sharing to improve the overall multi-task performance. The parameters are shared at a micro-structured level to both maintan the task performance and reduce inference computation. The sharing structure is automatically learned, where a model instance trained for previous tasks is progressively pruned and regrown to perform more tasks. The framework is task-generic and model-structure-agnostic. Using NIC and SISR as two example applications, extensive experiments show that the multi-task PnG network can largely reduce the overall model size and inference computation, with almost no degradation of the reconstruction performance.

Semi-Decoupled Partitioning for Video Coding Beyond AV1 基于AV1标准拓展的半解耦分块技术 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9506784

Abstract

Recently, the Alliance for Open Media (AOMedia) has initiated activities on exploring new coding tools with capabilities beyond AOMedia Video 1 (AV1). Among various coding modules within the conventional hybrid video coding structure, the block partitioning scheme builds the foundation of the codec and the related design needs to be sought out from the beginning. In this paper, a Semi-Decoupled Partitioning (SDP) method is proposed for coding block partitioning. With SDP, luma and chroma share the same coding block partitioning toward a specified partitioning depth. After this specified depth, the partitioning patterns of luma and chroma components can be optimized and signaled independently. The benefit of SDP is the additional flexibility of switching between dependent and independent partitioning patterns for luma and chroma since the characteristics of these color components can largely differ. The proposed method has been integrated on top of libaom research branch, and experimental results show that, significant coding gain can be achieved comparing to the libaom research anchor.

Video Coding Tool Analysis and Dataset for Gaming Content 游戏内容的视频编码工具分析和数据集 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9477454

Abstract

The gaming market has kept growing significantly in recent years. Driven by multiple technology advances, such as cloud computing and video technologies, new gaming applications, e.g. AR, VR and cloud gaming, are becoming more and more practical and popular. Among different types of gaming, the emergence of cloud gaming is driving the market with enhanced gamer experience as well as new challenges to the services. One of the key technological challenges of gaming applications is the video coding, which is the foundation of several popular gaming applications, including cloud gaming and game live streaming. Comparing to the typical camera captured content and screen content, gaming content presents unique features that directly lead to different preferences on the selection of coding tool sets. To better understand the behaviors of known video coding tools and provide test materials for research and development on future coding tools that benefits more on gaming content, in this paper, a dataset consists of a set of gaming video is proposed together with analysis of the performances of existing coding tools on these materials. It is observed that, several known coding tools are exceptionally beneficial for gaming content and the rational is analyzed in this paper.

Study On Coding Tools Beyond AV1 基于AV1标准拓展的新编码工具研究 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9428244

Abstract

The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. With this regard, this paper presents a package of coding tools that have been investigated, implemented and tested on top of the codebase, known as libaom, which is used for the exploration of next-generation video compression tools. The proposed tools cover several technical areas based on a traditional hybrid video coding structure, including block partitioning, prediction, transform and loop filtering. The proposed coding tools are integrated as a package, and a combined coding gain over AV1 is demonstrated in this paper. Furthermore, to better understand the behavior of each tool, besides the combined coding gain, the tool-on and tool-off tests are also simulated and reported for each individual coding tool. Experimental results show that, compared to libaom, the proposed methods achieve an average 8.0% (up to 22.0%) overall BD-rate reduction for All Intra coding configuration a wide range of image and video content.

Tencent led the joint work with Google

Block Partitioning Structure in the VVC Standard VVC标准中的块划分结构 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9452121

Abstract

Versatile Video Coding (VVC) is the latest video coding standard jointly developed by ITU-T VCEG and ISO/IEC MPEG. In this paper, technical details and experimental results for the VVC block partitioning structure are provided. Among all the new technical aspects of VVC, the block partitioning structure is identified as one of the most substantial changes relative to the previous video coding standards and provides the most significant coding gains. The new partitioning structure is designed using a more flexible scheme. Each coding tree unit (CTU) is either treated as one coding unit or split into multiple coding units by one or more recursive quaternary tree partitions followed by one or more recursive multi-type tree splits. The latter can be horizontal binary tree split, vertical binary tree split, horizontal ternary tree split, or vertical ternary tree split. A CTU dual tree for intra-coded slices is described on top of the new block partitioning structure, allowing separate coding trees for luma and chroma. Also, a new way of handling picture boundaries is presented. Additionally, to reduce hardware decoder complexity, virtual pipeline data unit constraints are introduced, which forbid certain multi-type tree splits. Finally, a local dual tree is described, which reduces the number of small chroma intra blocks.

Joint work with Mediatek, Qualcomm, HHI, etc.

Immersive Video Coding: Should Geometry Information be Transmitted as Depth Maps? 沉浸式视频编码：几何信息是否应该作为深度图传输？ https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9496638

Abstract

Immersive video often refers to multiple views with texture and scene geometry information, from which different viewports can be synthesized on the client side. To design efficient immersive video coding solutions, it is desirable to minimize bitrate, pixel rate and complexity. We investigate whether the classical approach of sending the geometry of a scene as depth maps is appropriate to serve this purpose. Previous work shows that bypassing depth transmission entirely and estimating depth at the client side improves the synthesis performance while saving bitrate and pixel rate. In order to understand if the encoder side depth maps contain information that is beneficial to be transmitted, we first explore a hybrid approach which enables partial depth map transmission using a block-based RD-based decision in the depth coding process. This approach reveals that partial depth map transmission may improve the rendering performance but does not present a good compromise in terms of compression efficiency. This led us to address the remaining drawbacks of decoder side depth estimation: complexity and depth map inaccuracy. We propose a novel system that takes advantage of high quality depth maps at the server side by encoding them into lightweight features that support the depth estimator at the client side. These features allow reducing the amount of data that has to be handled during decoder side depth estimation by 88%, which significantly speeds up the cost computation and the energy minimization of the depth estimator. Furthermore, -46.0% and -37.9% average synthesis BD-Rate gains are achieved compared to the classical approach with depth maps estimated at the encoder.

Joint work with Orange Labs and INRIA

Intra Prediction and Mode Coding in VVC VVC中的帧内预测和模式编码 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9400392

Abstract

This paper presents the intra prediction and mode coding of the Versatile Video Coding (VVC) standard. This standard was collaboratively developed by the Joint Video Experts Team (JVET). It follows the traditional architecture of a hybrid block-based codec that was also the basis of previous standards. Almost all intra prediction features of VVC either contain substantial modifications in comparison with its predecessor H.265/HEVC or were newly added. The key aspects of these tools are the following: 65 angular intra prediction modes with block shape-adaptive directions and 4-tap interpolation filters are supported as well as the DC and Planar mode, Position Dependent Prediction Combination is applied for most of these modes, Multiple Reference Line Prediction can be used, an intra block can be further subdivided by the Intra Subpartition mode, Matrix-based Intra Prediction is supported, and the chroma prediction signal can be generated by the Cross Component Linear Model method. Finally, the intra prediction mode in VVC is coded separately for luma and chroma. Here, a Most Probable Mode list containing six modes is applied for luma. The individual compression performance of tools is reported in this paper. For the full VVC intra codec, a bitrate saving of 25% on average is reported over H.265/HEVC using an objective metric. Significant subjective benefits are illustrated with specific examples.

Joint work with Huawei, Qualcomm, HHI, etc.

Motion Vector Coding and Block Merging in the Versatile Video Coding Standard VVC视频编码标准中的运动矢量编码和块合并技术 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9502124

Abstract

This paper overviews the motion vector coding and block merging techniques in the Versatile Video Coding (VVC) standard developed by the Joint Video Experts Team (JVET). In general, inter-prediction techniques in VVC can be classified into two major groups: “whole block-based inter prediction” and “subblock-based inter prediction”. In this paper, we focus on techniques for whole block-based inter prediction. As in its predecessor, High Efficiency Video Coding (HEVC), whole block-based inter prediction in VVC is represented by adaptive motion vector prediction (AMVP) mode or merge mode. Newly introduced features purely for AMVP mode include symmetric motion vector difference and adaptive motion vector resolution. The features purely for merge mode include pairwise average merge, merge with motion vector difference, combined inter-intra prediction and geometric partitioning mode. Coding tools such as history-based motion vector prediction and bidirectional prediction with coding unit weights can be applied on both AMVP mode and merge mode. This paper discusses the design principles and the implementation of the new inter-prediction methods. Using objective metrics, simulation results show that the methods overviewed in the paper can jointly achieve 6.2% and 4.7% BD-rate savings on average with the random access and low-delay configurations, respectively. Significant subjective picture quality improvements of some tools are also reported when comparing the resulting pictures at same bitrates.

Joint work with Qualcomm, HHI, Mediatek, etc.

MPEG Immersive Video Coding Standard MPEG 沉浸式视频编码标准 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9374648

Abstract

This article introduces the ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-I Part 12, which is undergoing standardization. The draft MIV standard provides support for viewing immersive volumetric content captured by multiple cameras with six degrees of freedom (6DoF) within a viewing space that is determined by the camera arrangement in the capture rig. The bitstream format and decoding processes of the draft specification along with aspects of the Test Model for Immersive Video (TMIV) reference software encoder, decoder, and renderer are described. The use cases, test conditions, quality assessment methods, and experimental results are provided. In the TMIV, multiple texture and geometry views are coded as atlases of patches using a legacy 2-D video codec, while optimizing for bitrate, pixel rate, and quality. The design of the bitstream format and decoder is based on the visual volumetric video-based coding (V3C) and video-based point cloud compression (V-PCC) standard, MPEG-I Part 5.

Joint work with Orange Labs and Polytechnic Institute of Paris

Overview of the Neural Network Compression and Representation (NNR) Standard 神经网络压缩和表示 (NNR) 标准概述 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9478787

Abstract

Neural Network Coding and Representation (NNR) is the first international standard for efficient compression of neural networks (NNs). The standard is designed as a toolbox of compression methods, which can be used to create coding pipelines. It can be either used as an independent coding framework (with its own bitstream format) or together with external neural network formats and frameworks. For providing the highest degree of flexibility, the network compression methods operate per parameter tensor in order to always ensure proper decoding, even if no structure information is provided. The NNR standard contains compression-efficient quantization and deep context-adaptive binary arithmetic coding (DeepCABAC) as core encoding and decoding technologies, as well as neural network parameter pre-processing methods like sparsification, pruning, low-rank decomposition, unification, local scaling and batch norm folding. NNR achieves a compression efficiency of more than 97% for transparent coding cases, i.e. without degrading classification quality, such as top-1 or top-5 accuracies. This paper provides an overview of the technical features and characteristics of NNR.

Joint work with Nokia, InterDigital, HHI, etc.

Overview of the Screen Content Support in VVC-Applications, Coding Tools, and Performance VVC 标准中的屏幕内容支持概述：应用、编码工具和性能 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9408666

Abstract

In an increasingly connected world, consumer video experiences have diversified away from traditional broadcast video into new applications with increased use of non-camera-captured content such as computer screen desktop recordings or animations created by computer rendering, collectively referred to as screen content. There has also been increased use of graphics and character content that is rendered and mixed or overlaid together with camera-generated content. The emerging Versatile Video Coding (VVC) standard, in its first version, addresses this market change by the specification of low-level coding tools suitable for screen content. This is in contrast to its predecessor, the High Efficiency Video Coding (HEVC) standard, where highly efficient screen content support is only available in extension profiles of its version 4. This paper describes the screen content support and the five main low-level screen content coding tools in VVC: transform skip residual coding (TSRC), block-based differential pulse-code modulation (BDPCM), intra block copy (IBC), adaptive color transform (ACT), and the palette mode. The specification of these coding tools in the first version of VVC enables the VVC reference software implementation (VTM) to achieve average bit-rate savings of about 41% to 61% relative to the HEVC test model (HM) reference software implementation using the Main 10 profile for 4:2:0 screen content test sequences. Compared to the HM using the Screen-Extended Main 10 profile and the same 4:2:0 test sequences, the VTM provides about 19% to 25% bit-rate savings. The same comparison with 4:4:4 test sequences revealed bit-rate savings of about 13% to 27% for Y′CBCR and of about 6% to 14% for R′G′B′ screen content. Relative to the HM without the HEVC version 4 screen content coding extensions, the bit-rate savings for 4:4:4 test sequences are about 33% to 64% for Y′CBCR and 43% to 66% for R′G′B′ screen content.

Joint work with HHI, Qualcomm, Orange Labs, etc.

Patch Decoder-Side Depth Estimation In Mpeg Immersive Video MPEG 沉浸式视频中的基于片的解码器侧深度信息估计技术 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9414056

Abstract

This paper presents a new approach for achieving bitrate and pixel rate reduction in the MPEG immersive video coding setting. We demonstrate that it is possible to avoid the transmission of some depth information in the Test Model for Immersive Video (TMIV) by estimating it at the receiver's side. Although the transmitted information in TMIV is considered as non-redundant, we show that it is possible to improve this algorithm. This method provides 3.4%, 9.0%, and 12.1% average BD-rate gain for natural content on high, medium, and low bitrate, respectively, with up to respectively 12.3%, 16.0%, and 18.4% peak reductions. Moreover, it preserves the perceptual quality as measured with MS-SSIM and VMAF metrics. Additionally, it decreases the pixel rate by 8.3% for each test sequence.

Joint work with Orange Labs and INRIA

The High-Level Syntax of the Versatile Video Coding (VVC) Standard VVC 标准中的高层语法 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9395142

Abstract

Versatile Video Coding (VVC), a.k.a. ITU-T H.266 | ISO/IEC 23090-3, is the new generation video coding standard that has just been finalized by the Joint Video Experts Team (JVET) of ITU-T VCEG and ISO/IEC MPEG at its 19th meeting ending on July 1, 2020. This paper gives an overview of the VVC high-level syntax (HLS), which forms its system and transport interface. Comparisons to the HLS designs in High Efficiency Video Coding (HEVC) and Advanced Video Coding (AVC), the previous major video coding standards, are included. When discussing new HLS features introduced into VVC or differences relative to HEVC and AVC, the reasoning behind the design differences and the benefits they bring are described. The HLS of VVC enables newer and more versatile use cases such as video region extraction, composition and merging of content from multiple coded video bitstreams, and viewport-adaptive 360° immersive media.

Joint work with Nokia, LG, Panasonic, etc.

Transform Coding in the VVC Standard VVC 标准中的变换编码 https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9449858

Abstract

In the past decade, the development of transform coding techniques has achieved significant progress and several advanced transform tools have been adopted in the new generation Versatile Video Coding (VVC) standard. In this paper, a brief history of transform coding development during VVC standardization is presented, and the transform coding tools in the VVC standard are described in detail together with their initial design, incremental improvements and implementation aspects. To improve coding efficiency, four new transform coding techniques are introduced in VVC, which are namely Multiple Transform Selection (MTS), Low-Frequency Non-separable Secondary Transform (LFNST) and Sub-Block Transform (SBT), as well as a large (64-point) type-2 DCT. The experimental results on VVC reference software (VTM-9.0) show that average 4.5% and 3.6% overall coding gain can be achieved by the VVC transform coding tools for All Intra and Random Access configurations, respectively.

Joint work with LG, Huawei, Qualcomm, etc.