Android 音视频系列：H264视频编码介绍

天天P图攻城狮

发布于 2018-06-28 15:10:29

2.5K3

发布于 2018-06-28 15:10:29

H264视频编码技术，是对序列帧图像进行压缩的技术。压缩之所以可能，是因为存在冗余数据。视频序列帧图像的冗余数据主要有：

空间冗余：同一个物体表面上采样点的颜色，存在空间连续性，是相同或相近的。
时间冗余：连续画面之间存在相关性，例如两人在房间里聊天，背景没有变化，人也只有位置和动作的变化。
结构冗余：某些结构是简单图像模式的重复，如蜂窝，方格地板。
知识冗余：某些图像的理解，跟知识有相关性。如人脸有固定的结构，包含眼、鼻子、嘴巴，按一定位置排列。可以对具备固定结构的图像元素，构造模型，结合图像库，只需要几个参数就可以表征。
视觉冗余：人眼对图像场的敏感性是非均匀和非线性的。对色度相对不敏感，对亮度更敏感。在高亮度区，人眼对亮度变化敏感度下降。对物体边缘敏感，对内部区域不敏感。可以根据这些视觉特性，对图像信息进行取舍。

H264是ITU-T的VCEG和ISO/IEC的MPEG的联合视频组（JVT, Joint Video Team）开发的一个数字视频编码标准，于2003年3月正式发布。它采用网络友好的结构和语法，有利于对误码和丢包的处理。在编码技术上，通过统一的VLC符号编码，高精度、多模式的位移估计，基于4X4块的整数变换，分层的编码语法等措施，使得H264算法具备很高的编码效率。引入的复杂编码算法，会降低编码性能，从而对实时编码提出了挑战。主要通过优化编码算法实现和硬件加速来缩短编码运算时间。

x264是VideoLAN组织实现H264编码的开源库。可以通过git clone http://git.videolan.org/git/x264.git获取源码。

使用开源库x264编码

配置编码参数

编码参数通过结构体x264_param_t来设定。可以通过函数int x264_param_default_preset( x264_param_t *param, const char *preset, const char *tune )来获取预设的参数。

x264_param_t param;
x264_param_default_preset(¶m, "medium", NULL);

然后在预设的参数基础上，可以根据需要修改具体的参数。

param.i_csp = X264_CSP_I420;
param.i_width = width;
param.i_height = height;
param.b_vfr_input = 0;
param.b_repeat_headers = 1;
param.b_annexb = 1;

i_csp指定输入帧的颜色格式，i_width和i_height指定帧的尺寸。b_vfr_input指定码率控制的依据，0是依据帧率设置，1是依据每帧的时间戳。b_repeat_headers为1，指定每个关键帧前都加入SPS和PPS。b_annexb为1，代表在每个NAL单元的开头加入4位起始码。

让设置生效，同时指定profile：

x264_param_apply_profile(¶m, "high");x264_encoder_open(¶m );

输入要编码的图像帧

输入帧的数据，由结构体x264_picture_t来表示。

typedef struct x264_picture_t{    /* In: force picture type (if not auto)
     *     If x264 encoding parameters are violated in the forcing of picture types,
     *     x264 will correct the input picture type and log a warning.
     * Out: type of the picture encoded */
    int     i_type;    /* In: force quantizer for != X264_QP_AUTO */
    int     i_qpplus1;    /* In: pic_struct, for pulldown/doubling/etc...used only if b_pic_struct=1.
     *     use pic_struct_e for pic_struct inputs
     * Out: pic_struct element associated with frame */
    int     i_pic_struct;    /* Out: whether this frame is a keyframe.  Important when using modes that result in
     * SEI recovery points being used instead of IDR frames. */
    int     b_keyframe;    /* In: user pts, Out: pts of encoded picture (user)*/
    int64_t i_pts;    /* Out: frame dts. When the pts of the first frame is close to zero,
     *      initial frames may have a negative dts which must be dealt with by any muxer */
    int64_t i_dts;    /* In: custom encoding parameters to be set from this frame forwards
           (in coded order, not display order). If NULL, continue using
           parameters from the previous frame.  Some parameters, such as
           aspect ratio, can only be changed per-GOP due to the limitations
           of H.264 itself; in this case, the caller must force an IDR frame
           if it needs the changed parameter to apply immediately. */
    x264_param_t *param;    /* In: raw image data */
    /* Out: reconstructed image data.  x264 may skip part of the reconstruction process,
            e.g. deblocking, in frames where it isn't necessary.  To force complete
            reconstruction, at a small speed cost, set b_full_recon. */
    x264_image_t img;    /* In: optional information to modify encoder decisions for this frame
     * Out: information about the encoded frame */
    x264_image_properties_t prop;    /* Out: HRD timing information. Output only when i_nal_hrd is set. */
    x264_hrd_t hrd_timing;    /* In: arbitrary user SEI (e.g subtitles, AFDs) */
    x264_sei_t extra_sei;    /* private user data. copied from input to output frames. */
    void *opaque;
} x264_picture_t;

输入帧的图像数据，本质上存储在结构体x264_image_t中。

typedef struct x264_image_t{    int     i_csp;       /* Colorspace */
    int     i_plane;     /* Number of image planes */
    int     i_stride[4]; /* Strides for each plane */
    uint8_t *plane[4];   /* Pointers to each plane */} x264_image_t;

i_csp代表颜色空间类型，i_plane代表通道数。常用为yuv420p，有y, u, v三个通道。数据存储在plane[0]指向的内存块中。plane[0], plane[1], plane[2]分别指向了y,u,v三个通道的数据内存起始位置。i_stride[0], i_stride[1], i_stride[2]代表了y, u, v三个通道每一行数据占用的长度。

yuv420p的数据存储结构如下：

编码数据生成

使用函数int x264_encoder_encode( x264_t *h, x264_nal_t **pp_nal, int *pi_nal, x264_picture_t *pic_in, x264_picture_t *pic_out )进行编码。

    i_frame_size = x264_encoder_encode( h, &nal, &i_nal, &pic, &pic_out );    if( i_frame_size < 0 )        goto fail;    else if( i_frame_size ) {        int i;        for (i = 0; i < i_nal; i++) 
            printf("i_nal = %d, i_frame_size = %d, nal i_payload = %d\n", i_nal, i_frame_size, nal[i].i_payload);
        }

编码生成的数据为nal。

i_nal = 4, i_frame_size = 13534, nal i_payload = 29i_nal = 4, i_frame_size = 13534, nal i_payload = 10i_nal = 4, i_frame_size = 13534, nal i_payload = 690i_nal = 4, i_frame_size = 13534, nal i_payload = 12805i_nal = 1, i_frame_size = 67, nal i_payload = 67i_nal = 1, i_frame_size = 1159, nal i_payload = 1159

nal在结构体x264_nal_t里表示。

/* The data within the payload is already NAL-encapsulated; the ref_idc and type
 * are merely in the struct for easy access by the calling application.
 * All data returned in an x264_nal_t, including the data in p_payload, is no longer
 * valid after the next call to x264_encoder_encode.  Thus it must be used or copied
 * before calling x264_encoder_encode or x264_encoder_headers again. */typedef struct x264_nal_t{    int i_ref_idc;  /* nal_priority_e */
    int i_type;     /* nal_unit_type_e */
    int b_long_startcode;    int i_first_mb; /* If this NAL is a slice, the index of the first MB in the slice. */
    int i_last_mb;  /* If this NAL is a slice, the index of the last MB in the slice. */

    /* Size of payload (including any padding) in bytes. */
    int     i_payload;    /* If param->b_annexb is set, Annex-B bytestream with startcode.
     * Otherwise, startcode is replaced with a 4-byte size.
     * This size is the size used in mp4/similar muxing; it is equal to i_payload-4 */
    uint8_t *p_payload;    /* Size of padding in bytes. */
    int i_padding;
} x264_nal_t;

nal的数据存储在p_payload中。按以下的格式进行存储：

既可以通过每个nal data的指针与数据长度来分别访问每个nal，也可以通过nal[0]的数据起始地址uint8 *p_payload和总长度i_frame_size来一次访问所有的nal data。当要把nal写进视频容器时，会采用第一种访问的方式；当生成h264码流时，会采用第二种访问的方式。

出来的nal主要有以下几种类型：

NAL_SPS: sequence parameter set
NAL_PPS: picture parameter set
NAL_SEI: supplemental enhancement information
NAL_SLICE_IDR: coded slice of an IDR picture
NAL_SLICE: codec slice of a noe-IDR picture

NAL数据

H264视频编码标准适应于不同网络之间的视频传输，起主要原因是引入了分层结构，即将图像压缩数据分成网络抽象层(NAL, Network Abstraction Layer)和视频编码层（VCL, Video Coding Layer），从而实现了压缩编码与网络传输分离，使编码层能够移植到不同的网络结构中。这样不但使得H264对目前显存的各种网络有很强的网络友好性，而且使它对未来的网络具有很强的适应性。

NAL定义了数据封装的格式和统一的网络接口，数据打包在网络提取层单元（NALU)中，有利于数据在网络中传输。包头信息包含存储标志和类型标志，其中存储标志用于指示当前数据属不属于被参考的帧，从而便于服务器根据网络的拥塞情况进行丢弃，类型标志用于指示图像的数据类型。

NAL负责使用下层网络的分段格式来封装数据，包括组帧、逻辑信道的信令、定时信息的利用和序列帧结束信号等。NAL支持视频在电路交换信道的传输格式，支持视频在网络上利用RTP/UDP/IP传输的格式。为了提高H264的NAL在不同特性网络上定制VCL数据格式的能力，在VCL和NAL之间定义的基于分组的接口、打包和相应的信令也属于NAL的一部分。

作者简介：taoxiong(熊涛)，天天P图 AND 工程师