I have a slight problem decoding XAVC 4K material (shot with F55). I’m converting it to DNxHD 36. Conversion speed on my computer is around 28-30 fps. If I convert 4K RAW to DNxHD 36 (same computer) I can get around 45 fps.
Also, when converting the XAVC footage, the CPU is only used about 55%. But with RAW it hovers around 95%. (This is in Task Manager.)
I ran a very similar test in DaVinci Resolve and ffmpeg. Both were converting around 26-30fps and also using only 50% of the CPU.
So this tells me it’s not exactly an issue with Cortex (other software is not utilizing the CPU either). But stems more from the XAVC codec. Or maybe from the Sony SDK? I know XAVC is already compressed so it needs to be decompressed before any type of color or conversion; but is it so complex that the CPU is only running at 50%?
Xeon 6-core 3.33Ghz
Source footage on dedicated RAID (SAS, 600+MB/sec)
Destination NAS over 1Gb
What’s likely happening here is that the GPU is the bottleneck.
In ‘optimized’ mode, when we decode Sony 4K RAW and your output is ony HD, we decode the footage at half resolution. That means there is a lot less GPU processing to do.
With XAVC, there’s no such thing as a half resolution decode, so we’re processing the full 4K frame. Since the GPU is the bottleneck, your CPU doesn’t get fully utilized.
So would a more powerful GPU help? Or an additional GPU card? Currently there’s a single GTX670 installed.
I also tried this on a similarly configured machine with a single GTX780 (not Ti) and got marginally better results (35-40fps).
It should help, but I haven’t done the tests to prove it.
Following up on some tests I did. We have a new custom built PC (from Maingear). Here’s the quick specs:
- i7 4960 3.6Ghz 6-core CPU
- GTX 670 GPU
- GTX 780ti GPU
- 32GB RAM
- Cortex 1.5.2 b4891
- XAVC footage on SAS RAID (same type as in first post)
Cortex configured to use the 670 GPU as playback and encoding, and the 780ti GPU as just encoding. I’m getting similar speeds at in post # 3 (above) - about 40fps. And the CPU usage was just under 50%.
So I tried an unorthodox test. I found the xml files that feed into CohogRender exe. I opened two windows of Command Prompt and I started two instances of CohogRender to render two clips simultaneously. CPU usage got up to about 87%.
Clip #1: duration is 3:03, took 149 seconds to render (about 29fps)
Clip #2: duration is 3:04, took 150 seconds to render (also about 29fps)
So according to some quick math, this rig can process XAVC at nearly 60fps. For whatever reason (I’m guessing the complex structure of XAVC?) Cortex can’t read/decode faster than 35-40fps. But two clips can be run at the same time to get CPU saturation.
So here’s the question: is it possible to have Cortex render two files at the same time? Or is there any optimization that can be done to speed up XAVC reading/decoding?
It’s either the XAVC decoder or something not fully optimized when pushing large frame through the GPU… you may be benefiting from additional overlap between I/O and computation when running two at a time…
What speeds do you get if you run 3 at a time?
Did you experiment at all with the
--whichgpu parameters in cohogrender?
C:\Program Files\MTI Film\Cortex_1_5_2_b4972>cohogrender
MTI Film - CohogRender
dump (Default: ) Dump specific frames including source,
unpacked, render, final, or encoded. Use all to dump all.
readonly (Default: False) Using this flag will cause the render to
not create an output file.
throttle (Default: -1) The maximum speed for this render.
gpucount (Default: 1) The number of gpus to use (1 or 2).
whichgpu (Default: 0) Which gpu to use (0 or 1).
redrocketcount (Default: 0) The number of red rockets to use (0 or 1).
whichredrocket (Default: -1) Which red rocket to use (0 or 1).
status (Default: 0) The automatic status reporting interval in
v, videofile (Default: ) The source file for the picture.
a, audiofile (Default: ) The source file for the audio.
o, outputfile (Default: ) The output file name.
tc (Default: ) The start timecode of the output file.
fps (Default: ) The framerate of the output file.
t, templatefile (Default: ) The template / configuration file for the
verify (Default: ) This file will be used to automatically check
against the hash list generated by the --dump encoded
i, inputsequence (Default: ) Instead of -a, -v, supply a file that
describes the input files and metadata. This is a reel
mastermob (Default: ) This will be used as the master mob id when
rendering a dnxhd file.
sourcemob (Default: ) This will be used as the source mob id when
rendering a dnxhd file.
help Display this help screen.
It may also be interesting to convert those files to 4K ProRes Proxy and then use those as sources to see whether a different codec performs differently.
OK, I’m back with more numbers and details. Note: to get FPS, here’s the formula I’m using:
(number of seconds in media file ÷ number of seconds to render) * 24
Clip #1: duration is 3:03, took 203 seconds to render (~ 21fps)
Clip #2: duration is 3:04, took 202 seconds to render (~ 21fps)
Clip #3: duration is 2:38, took 180 seconds to render (~ 21fps)
The fps is skewed a bit because Clip #3 is not the same duration, but it’s close enough to give some indication.
I tried the conversion with --gpucount to 2, I got the same numbers as going through Cortex with 670=Playback/Encode and 780ti=Encode.
Converting from XAVC to 4K ProRes Proxy converted at about 40fps (no surprise). Converting the resulting ProRes files to DNxHD 36 was quite a bit better.
- Single clip rendering in Cortex, duration 3:03, rendered at ~110fps (fps jumped from 108-112, according to Cortex). CPU usage about 70%.
- Two clips going through Command Prompt and Cohog. Clip durations 3:03, 3:04 - it took 59 and 61 seconds (respectively). This yields about 72fps for each. CPU usage pegged at 100%
So using ProRes for renders gives a huge difference. Single file render is much faster. But multiple file render is still pretty impressive. Given all this, it seems like XAVC is the speed holdout. Since the 4K ProRes files screamed by, it sounds like the issue is with the XAVC decode and not with any GPU bottleneck. For what it’s worth, DaVinci Resolve also had a huge speed hit when dealing with XAVC files so it could just be the complex nature of XAVC that’s the culprit.
Given all this, I’ll go back to my question from earlier: can Cortex render two files at the same time? Or can you sprinkle some magic MTI dust on the XAVC decode and make it go faster?
Yep, I agree that your tests prove this, unless there’s something inefficient about the bitpacking of the frames returned by the XAVC decoder that I’m unaware of.
@peter what are your thoughts on this?
I think it may be more than just the h.264 decoder at play. For XAVC files, we use the Sony MXF SDK to get the compressed data and then send it to the Mainconcept H.264 decoder. @hans knows more about this area than I do, but my guess is we could try adding another H.264 decoder into the equation and get higher throughput.
Hi guys @peter, @hans
Just curious if you have any update on this. Any ideas to speed it up? Anything I can do to help/test?
No news yet but @peter will investigate a little further tomorrow or early next week and we’ll see if we get a better understanding.
I did some testing on my end and think this is related to the AVC decoder running slower than it should. We use the same decoder to decode other AVC files like Canon 5d/7d, GoPro etc. If I run the XAVC file through that code I get 100% CPU usage, but I also almost max out my memory. I have 16 GB and was using 15.8 GB.
To me this points to the decoder not being setup properly when used with the Sony SDK. We could very easily switch the code so XAVC files go through the same path as other AVC files, but I think we should use the Sony SDK to read metadata from the files. I’ll see what I can do to get the decoder to behave as expected with the Sony SDK.
Thanks @peter! This sounds promising. Keep us posted!
BTW - our workstations have either 24 or 32GB RAM so I’m hoping RAM won’t be an issue. Let me know if you want me to test anything.
I just tried running some tests on one of our test workstations. I don’t have exact specs, but it is a Xeon 3.33 GHz processor with 8 cores and a GTX 680. I was able to get he CPU up to 90-95% when transcoding XAVC 4K to ProRes proxy. It was only running around 30 fps though.
I also tried another version of the code which has some changed AVC decoder settings and I got the same exact CPU usage and speed. Not sure what to make of it now. In my previous post I was testing on my laptop, but I was seeing the CPU usage issues you reported. After I made the changes I was getting CPU usage above 90%.
Let me see about getting you a build with those changes to see if they make any difference.
@24pdailies Here is a link to a version with what I think are better AVC decoder settings:
This is only a beta version so please don’t run it on any production projects. Let me know if this makes any difference for you.
This version made a little difference.
Xeon 3.33Ghz 6-core
XAVC 4K to DNxHD 36:
Cortex 1.5.2 b5004= ~30fps
Cortex 1.5.3 b5065= ~37-38
XAVC 4K to ProRes 422HQ:
Cortex 1.5.2 b5004= ~25fps
Cortex 1.5.3 b5065= Failed to convert
Still, the DNx test shows a 25% increase in speed and every little bit counts. I’m not too worried about the ProRes - this was just a quick test. RAM usage stayed around 3GB for all.
I get 37 fps rendering to ProRes 422 HQ with the new build…
We have some theories about the what the rest of the difference is. Disk I/O may be part of it. Looks like the Sony SDK reads the files in ~ 64K chunks, vs. when we read ProRes, which we do in ~ 4MB chunks.
The other difference is the bit packing of the decoded frames, which ends up being 25% more data to push through the GPU vs when we decode ProRes Proxy.
In the future, we can look at further optimizations there (speed / quality tradeoffs)… if it seems worthwhile.
The changes @peter made are now part of v1.5.3 beta b5081
I found a thread that talks about converting 4K XAVC to 4K ProRes. Some guys have given their tricks.
There’s a few major problems going that route though:
Both Resolve and ffmpeg mentioned in the thread convert 4K XAVC at or below the speeds of Cortex. My testing with Resolve yielded ~85% speed of Cortex (Resolve on Mac). ffmpeg got around 60% speed of Cortex. Both tests rendering out to ProRes
After you’ve rendered out your ProRes 4K, you will probably want a proxy version to edit with. I can’t imagine trying to cut smoothly with 4K video files. So you’d have to speed additional time to create another version.
I guess what it comes down to is your workflow. For us, we need to rush and get editorial done as soon as possible so we push XAVC directly to DNxHD 36. For you it might be more about a Final Cut workflow that needs ProRes or maybe you need files for a specific VFX pipeline. If it works for you, then go for it!