Loss in converting YUV to PNG

Question

I have a YUV 420NV12 image. I'm using ffmpeg to convert the YUV file to PNG and then back again to YUV.

ffmpeg -s 640x480 -pix_fmt nv12 -i testold.yuv test-640x480 test.png
ffmpeg -i test.png -s 640x480 -pix_fmt nv12 testnew.yuv

When I compare the two YUV files, there are few differences. I expected PNG to be a lossless compression. What is the reason for difference in two files?

Plus: is the original YUV range compatible with traditional RGB PNG inputs? See for instance http://stackoverflow.com/a/32188371/7453794 — Laurent Duval, May 18 '17 at 21:01

score 8 · Accepted Answer · edited May 18 '17 at 21:51

8

The PNG format is lossless for RGB24 image data. However the conversion from YUV to RGB24 is not lossless, as the two formats quantize the color space differently.

To see this, the following animated gif was made by applying your two ffmpeg operations 200 times back-and-forth and collating the resulting 200 images into the gif.

By contrast, the following gif was generated using yuv444p as the YUV format instead of nv12. This format does not do subsampling / as much quantization.

edited May 18 '17 at 21:51

Peter K.

25,714
9
46
91

answered May 18 '17 at 21:39

hotpaw2

35,346
9
47
90

This may be pedantic, but I think that when converting rgb->yuv->rgb, the losses occur in the first conversion (rgb->yuv), not in the second (yuv->rgb). YUV is a rotated "superset cube" of rgb. Thus, the code density of yuv is lower within the "rgb-legal" cube, and the volume of yuv values that represent illegal rgb values are illegal anyways. – Knut Inge Nov 14 '22 at 08:18
The image getting darker and darker suggests that a biased rounding operation occurs somewhere in the pipeline. E.g. doing (a+b)/2 using the C-default division operator. – Knut Inge Nov 15 '22 at 14:08

Knut Inge · Answer 2 · 2022-11-15T14:47:57.553

There may be programming quirks or bugs that turn anything into anything. I won't touch those, but rather address the fundamentals of your question.

Color format

Common rgb formats ("sRGB") represent each pixel with either 24 bits (8 bits each for "red, "green" and "blue") or 32 bits (8 additional bits for an alpha channel that is irrelevant here). They are related to the physics of light using electro-optic functions and spectral sensitivity that may differ from format to format but is easily forgotten when working with transmission/storage-focused software.

NV12 represents rgb after applying a 3x3 matrix turning it into luma ("y"), blue difference chroma ("Cb" or "U") and red difference chroma ("Cr" or "V").

Referring to "Digital video and HDTV algorithms and interfaces" (Poynton, 2003) p.p. 285: "There are about 2.75 million valid codewords in 8-bit Y'CbCr, compared to 10.6 million in 8-bit studio R'G'B'. If R'G'B is transcoded to Y'CbCr, then transcoded back to R'G'B', the resulting R'G'B' cannot have any more than 2.75 million colors."

(The author uses prime (') for gamma-encoded video (what is most common), and "studio" referring to limited range numbers (16...235 rather than 0...255) which is the norm in television systems but uncommon in computers)

Thus, even YCbCr 4:4:4 at 8 bits per primary will have approximately 25% of the color possibilities of rgb at 8 bits per primary.

Spatial downsampling

You start out with 4:2:0 where the chroma channels are downsampled by 2x2. This can be converted to rgb and back again without loss if simple 2x2 boxcar filtering is employed in the downsampling and upsampling. The 4:2:0->4:4:4 conversion would simply repeat one Cb (or Cr) scalar in a 2x2 pattern:

Cb(i,j) -> [Cb(2i,2j),   Cb(2i,2j+1);
            Cb(2i+1,2j), Cb(2i+1,2j+1)]

And when converting back 4:4:4->4:2:0 we would have:

Cb(i,j) = (Cb(2i,2j) + Cb(2i,2j+1) + Cb(2i+1,2j) + Cb(2i+1,2j+1)) / 4;

Video tends to be compute heavy. Thus any given implementation might do fixed-point operations that sacrifice numerical properties for speed of execution.

MATLAB example

This snippet codifies Eq 25.12 and 25.13 of the Poynton book referred above. Essentially, this is what you would do to encode JPEG/JFIF "fullrange" BT.601 to "computer" rgb and back again. Note that there are a number of variations on this conversion, depending on if you are using ITU BT.601, BT.709, BT.2020, if you use limited range 16-235/239 or full range 0-255 numbers. I have written this code prioritizing clarity.

%% define a minimal YCbCr 4:2:0 "image" of 2x2 pixels
y_in = [100 101;...
        102 103];
cb_in = 131;
cr_in = 119;
%% upscale chroma to 4:4:4
cb = kron(cb_in, ones(2));
cr = kron(cr_in, ones(2));
in = cat(3, y_in, cb, cr);
%% convert color format from rgb (full scale) 
% to ITU BT.601 (full scale) and back
kb = 0.114;%0.0722
kr = 0.299;%0.2126
kg = (1-kb-kr);
fwd_mat = [kr     kg kb;...
        (0.5/(1-kb))*[-kr    -kg (1-kb)];...
        (0.5/(1-kr))*[(1-kr) -kg -kb]];   
scaling = [1 1 1];%[256 255 255]./256;
scaled_fwd_mat = scaling' .* fwd_mat;
inv_mat = [1             0              2*(1-kr);...
           (-kr-kb+1)/kg 2*kb*(kb-1)/kg 2*kr*(kr-1)/kg;...
           1             2*(1-kb)       0];
scaled_inv_mat = (1./scaling).*inv_mat;
for row = 1:size(in,1)
    for col = 1:size(in,2)
        %% convert to rgb
        norm_imdata = shiftdim(double(in(row,col,:)), 1) - [0 128 128];        
        rgb = scaled_inv_mat * norm_imdata';
        %% quantize and back again
        rgb = double(uint8(rgb));
        %% convert back to YCbCr        
        ycbcrtmp = scaled_fwd_mat * rgb;
        ycbcrtmp = ycbcrtmp + [0 128 128]';
        out(row,col,:) = double(uint8(ycbcrtmp));
    end
end
%% downscale chroma to 4:2:0
y_out = out(:,:,1);
cb_out = out(:,:,2);
cr_out = out(:,:,3);
cb_out = conv2(cb_out, ones(2)/4, 'same');
cb_out = cb_out(1:2:end-1, 1:2:end-1);
cr_out = conv2(cr_out, ones(2)/4, 'same');
cr_out = cr_out(1:2:end-1, 1:2:end-1);
y_in - y_out
cb_in - cb_out
cr_in - cr_out

I could not make sense of Poyntons 256/255 mid-step/mid-thread scaling/clipping of chroma channels, so I left that one out, producing slightly different numbers.

We see that the 3x3 matrixes are inverses (to within reasonable numerical precision):

>> scaled_fwd_mat*scaled_inv_mat-eye(3)
ans =
1.0e-15 *
-0.1110         0         0
         0         0   -0.0278
    0.0833         0         0

MATLAB fun

If you dislike seeing explicit loops in MATLAB, the above code can be compacted using the new tensorprod function. While it may be obscure for some, I hope that you find it compact and possibly fast if you enjoy MATLAB-esque.

%% define 4:2:0 input as unsigned 8-bit ranged double-precision floats
y_in = [100 101;...
        102 103];
cb_in = 131;
cr_in = 119;
%% upscale chroma to 4:4:4
cb = kron(cb_in, ones(2));
cr = kron(cr_in, ones(2));
in = cat(3, y_in, cb, cr);
%% convert color format from rgb (full scale) 
% to ITU BT.601 (full scale) and back
kb = 0.114;%0.0722
kr = 0.299;%0.2126
kg = (1-kb-kr);
fwd_mat = [kr     kg kb;...
        (0.5/(1-kb))*[-kr    -kg (1-kb)];...
        (0.5/(1-kr))*[(1-kr) -kg -kb]];   
inv_mat = [1             0              2*(1-kr);...
           (-kr-kb+1)/kg 2*kb*(kb-1)/kg 2*kr*(kr-1)/kg;...
           1             2*(1-kb)       0];
%% convert to rgb
norm_imdata = double(in) - shiftdim([0 128 128], -1);        
rgb = tensorprod(norm_imdata, inv_mat, 3, 2);
%% quantize and back again
rgb = double(uint8(rgb));
%% convert back to YCbCr        
ycbcrtmp = tensorprod(rgb, fwd_mat, 3, 2);
ycbcrtmp = ycbcrtmp + shiftdim([0 128 128], -1);        
out = double(uint8(ycbcrtmp));
%% downscale chroma to 4:2:0
y_out = out(:,:,1);
cb_out = conv2(out(:,:,2), ones(2)/4, 'same');
cb_out = cb_out(1:2:end-1, 1:2:end-1);
cr_out = conv2(out(:,:,3), ones(2)/4, 'same');
cr_out = cr_out(1:2:end-1, 1:2:end-1);

score 0 · Answer 3 · edited Nov 15 '22 at 12:18

Converting YUV 4:2:0 to RGB requires upscaling the chroma planes. Downscaling them to get YUV 4:2:0 again will result in differences to the original input (unless the appropriate descale function is used, which of course is never the default as finding it without further information would require extensive analysis of the source, and simply upscaled chroma planes should not occur in a usual workflow anyway).

Then there is also the issue of rounding errors that occur in any undithered YUV to RGB conversion (dithering however adds some noise), and any RGB to YUV conversion, however, in this scenario those errors might very well cancel each other out again when converting back to YUV, not sure about that.

Loss in converting YUV to PNG

3 Answers3

Color format

Spatial downsampling

MATLAB example

MATLAB fun