7

I have a high-end HPE ProLiant DL325 G10 server with an AMD EPYC 7401P 24-core/48-thread CPU, 128GB DDR4 RAM, and a Intel P4800x PCIe NVMe. I tried to run ffmpeg to convert a video (MKV) to MP4 for online streaming and it does not utilize all 24 cores. During the encoding process it uses about 10% of the CPU according to top. An example of the top output is below.

I've read similar questions on stackexchange sites but all are left unanswered and are 6-8 years old. I tried adding the -threads parameter, before and after -i with option 0, 24, 48 and various others but it seems to ignore this input. I'm not scaling the video either.

I'm also encoding in H.264. Below are some of the commands I've used. I can't figure out what I'm doing wrong or what the bottleneck exactly is.

Any suggestions how I can go about this?

Command used:

ffmpeg -threads 24 -i input.mkv -c:v libx264 -preset medium -c:a copy -vf subtitles=input.mkv output.mp4

I also trued using -sws_flags fast_bilinear & -x264-params sliced-threads=1 but both don't change much. I did notice using -tune zerolatency slightly increased CPU use for some cores, but the overall CPU usage was below 15%.

top output:

top - 16:14:13 up 57 min,  2 users,  load average: 2.65, 0.59, 0.50
Tasks: 509 total,   1 running, 280 sleeping,   1 stopped,   0 zombie
%Cpu0  :  5.7 us,  0.7 sy, 12.2 ni, 80.4 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu1  :  2.6 us,  0.3 sy,  4.6 ni, 92.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu2  :  6.2 us,  0.3 sy,  6.9 ni, 86.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  :  4.0 us,  0.0 sy,  1.7 ni, 94.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu4  :  2.0 us,  0.0 sy, 13.9 ni, 84.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu5  :  2.3 us,  0.0 sy,  3.3 ni, 94.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu6  :  6.6 us,  0.3 sy,  8.2 ni, 84.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu7  :  8.1 us,  0.0 sy, 12.8 ni, 79.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu8  :  1.7 us,  0.3 sy, 31.5 ni, 66.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu9  : 21.6 us,  0.0 sy,  2.6 ni, 75.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu10 : 15.3 us,  0.7 sy,  9.6 ni, 74.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu11 : 12.3 us,  0.0 sy, 10.3 ni, 77.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu12 :  1.3 us,  0.3 sy, 26.4 ni, 71.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu13 :  3.3 us,  0.0 sy, 12.0 ni, 84.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu14 :  4.0 us,  0.0 sy,  7.7 ni, 88.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu15 :  2.0 us,  0.3 sy, 20.3 ni, 77.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu16 :  4.0 us,  0.3 sy, 29.7 ni, 66.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu17 :  2.3 us,  0.3 sy, 21.7 ni, 75.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu18 : 11.6 us,  0.0 sy, 22.8 ni, 65.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu19 :  6.0 us,  0.0 sy, 20.4 ni, 73.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu20 :  6.0 us,  0.0 sy, 26.3 ni, 67.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu21 :  0.3 us,  0.3 sy, 44.5 ni, 54.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu22 :  0.0 us,  0.3 sy, 26.1 ni, 73.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu23 : 20.1 us,  0.0 sy, 15.7 ni, 64.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu24 :  7.6 us,  0.7 sy,  5.0 ni, 86.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu25 :  3.0 us,  0.3 sy, 14.9 ni, 81.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu26 :  0.3 us,  0.0 sy, 16.6 ni, 83.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu27 :  3.0 us,  0.7 sy,  0.0 ni, 96.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu28 :  0.7 us,  0.0 sy,  0.3 ni, 99.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu29 :  8.3 us,  0.0 sy,  0.7 ni, 91.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu30 :  6.7 us,  0.3 sy,  9.7 ni, 83.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu31 :  4.0 us,  0.3 sy, 22.3 ni, 73.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu32 :  1.0 us,  0.0 sy, 24.6 ni, 74.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu33 : 10.6 us,  0.3 sy,  9.9 ni, 79.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu34 :  5.4 us,  0.0 sy,  2.7 ni, 91.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu35 :  6.6 us,  0.0 sy,  4.6 ni, 88.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu36 :  4.3 us,  0.0 sy, 17.1 ni, 78.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu37 :  5.6 us,  0.3 sy,  3.3 ni, 90.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu38 :  2.3 us,  0.0 sy,  8.6 ni, 89.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu39 :  2.3 us,  0.0 sy, 22.0 ni, 75.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu40 :  1.7 us,  0.0 sy, 17.4 ni, 80.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu41 :  6.0 us,  0.3 sy,  3.3 ni, 90.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu42 :  4.7 us,  0.3 sy, 15.7 ni, 79.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu43 :  7.9 us,  0.7 sy, 24.2 ni, 67.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu44 :  7.0 us,  0.3 sy, 28.3 ni, 64.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu45 : 29.1 us,  1.0 sy, 13.9 ni, 56.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu46 :  0.0 us,  0.0 sy, 45.3 ni, 54.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu47 :  0.0 us,  0.0 sy, 48.5 ni, 51.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

ffmpeg info:

ffmpeg version 3.4.4-0ubuntu0.18.04.1 Copyright (c) 2000-2018 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.3.0-16ubuntu3)
  configuration: --prefix=/usr --extra-version=0ubuntu0.18.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
  libavutil      55. 78.100 / 55. 78.100
  libavcodec     57.107.100 / 57.107.100
  libavformat    57. 83.100 / 57. 83.100
  libavdevice    57. 10.100 / 57. 10.100
  libavfilter     6.107.100 /  6.107.100
  libavresample   3.  7.  0 /  3.  7.  0
  libswscale      4.  8.100 /  4.  8.100
  libswresample   2.  9.100 /  2.  9.100
  libpostproc    54.  7.100 / 54.  7.100

Ubuntu version:

Linux ubuntu 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

And lastly here is an is a snippet of the output during the encoding process:

frame=34048 fps=316 q=-1.0 Lsize=  119840kB time=00:23:40.03 bitrate= 691.3kbits/s dup=2 drop=0 speed=13.2x    
  • I think you may be trying to measure something using the wrong stick. Due to architecture, it is unlikely your computer can feed all of those cores and keep them busy all the time. That little part of the superhighway your computer is composed of is that many lanes wide, but the rest of the highway just isn't. Instead you should be asking how to optimize FFMPEG for your specific computer, to make it use the resources it has available most effectively. That is a different question, and one likely to be concretely answerable. – music2myear Mar 28 '19 at 21:33
  • See these questions and their answers: https://stackoverflow.com/questions/7379980/thread-count-option-in-ffmpeg-for-fastest-conversion-to-h264 https://superuser.com/questions/558402/ffmpeg-doesnt-use-maximum-cpu-power https://superuser.com/questions/155305/how-many-threads-does-ffmpeg-use-by-default – music2myear Mar 28 '19 at 21:34
  • 1
    Thanks for the comment @music2myear based on what you linked me to it appears I can push the CPU utilization by encoding in parallel. https://superuser.com/questions/538164/how-many-instances-of-ffmpeg-commands-can-i-run-in-parallel/547340#547340 seems to break a apart every 60 seconds, but it does not go into how to merge them together. I wish I could comment on the post but I haven't hit the rep yet. Is there something you may have to add to this? – Aco Strkalj Mar 28 '19 at 23:01
  • Not really. I have no experience pushing a computer to encode at peak efficiency. I've run Handbrake, and that's the closest I've come to ffmpeg. I'm just a really experienced tech and a good googler, so I have opinions and research strength. – music2myear Mar 28 '19 at 23:38
  • 5
    -threads 24 -i input.mkv --> this sets decoding threads, not encoding. Place -threads after the inputs. AFAIK, x264 tops out at vertical resolution / 40 threads. That limit is set for effective motion search. – Gyan Mar 29 '19 at 05:04
  • Parallel encoding of multiple videos is the way to go. VoD providers like Netflix go as far as creating small chunks of video for every scene, which are then encoded separately. (Not saying that this is what you should do; it requires more inspection of the encoding parameters beforehand.) – slhck Mar 29 '19 at 09:29

0 Answers0