CPU Performance: Encoding

One of the interesting elements on modern processors is encoding performance. This covers two main areas: encryption/decryption for secure data transfer, and video transcoding from one video format to another.

In the encrypt/decrypt scenario, how data is transferred and by what mechanism is pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security.

Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.

HandBrake 1.32: Link

Video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codecs, VP9 and AV1, there are others that are prominent: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H.265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content. There are other codecs coming to market designed for specific use cases all the time.

Handbrake is a favored tool for transcoding, with the later versions using copious amounts of newer APIs to take advantage of co-processors, like GPUs. It is available on Windows via an interface or can be accessed through the command-line, with the latter making our testing easier, with a redirection operator for the console output.

Finding the right combination of tests to use in our Handbrake benchmark is often difficult. There is no one test that covers all scenarios – streamers have different demands to production houses, then there’s video call transcoding that also requires some measure of CPU performance.

This time around, we’re probing a range of quality settings that seem to fit a number of scenarios. We take the compiled version of this 16-minute YouTube video about Russian CPUs at 1080p30 h264 and convert into two different files:

  1. 1080p30 to 480p30 ‘Discord’: x264, Max Rate 2100 kbps, High Profile 4.0, Medium Preset, 30 Peak FPS
  2. 1080p30 to 720p30 ‘YouTube’: x264, Max Rate 25000 kbps, High Profile 3.2, Medium Preset, 30 Peak FPS

We expect to see most mobile CPUs can manage (1) in realtime, but (2) might be a challenge.

(5-1a) Handbrake 1.3.2, 1080p30 H264 to 480p Discord(5-1b) Handbrake 1.3.2, 1080p30 H264 to 720p YouTube

The extra cores of the AMD processor show through, with a small 12% jump from Ice to Tiger at 15W and a bigger jump moving to the 28 W mode.

WinRAR 5.90: Link

For the 2020 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack

  • 33 video files , each 30 seconds, in 1.37 GB,
  • 2834 smaller website files in 370 folders in 150 MB,
  • 100 Beat Saber music tracks and input files, for 451 MB

This is a mixture of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test for 20 minutes times and take the average of the last five runs when the benchmark is in a steady state.

For automation, we use AHK’s internal timing tools from initiating the workload until the window closes signifying the end. This means the results are contained within AHK, with an average of the last 5 results being easy enough to calculate.

(5-4) WinRAR 5.90 Test, 3477 files, 1.96 GB

Along with single-core frequency, WinRAR benefits a lot from memory bandwidth as well as cache type. We have seen in past that the eDRAM enabled processors give a good benefit to software like WinRAR, and it is usually where we see the biggest DRAM differences. The Tiger Lake and 4800U use LPDDR4X-4266, with the Ice on LPDDR4X-3733. The extra power budget of the 28W mobile Tiger Lake seems to offer the most benefit.

CPU Tests: Rendering

Rendering tests, compared to others, are often a little more simple to digest and automate. All the tests put out some sort of score or time, usually in an obtainable way that makes it fairly easy to extract. These tests are some of the most strenuous in our list, due to the highly threaded nature of rendering and ray-tracing, and can draw a lot of power. If a system is not properly configured to deal with the thermal requirements of the processor, the rendering benchmarks is where it would show most easily as the frequency drops over a sustained period of time. Most benchmarks in this case are re-run several times, and the key to this is having an appropriate idle/wait time between benchmarks to allow for temperatures to normalize from the last test.

Blender 2.83 LTS: Link

One of the popular tools for rendering is Blender, with it being a public open source project that anyone in the animation industry can get involved in. This extends to conferences, use in films and VR, with a dedicated Blender Institute, and everything you might expect from a professional software package (except perhaps a professional grade support package). With it being open-source, studios can customize it in as many ways as they need to get the results they require. It ends up being a big optimization target for both Intel and AMD in this regard.

For benchmarking purposes, Blender offers a benchmark suite of tests: six tests varying in complexity and difficulty for any system of CPUs and GPUs to render up to several hours compute time, even on GPUs commonly associated with rendering tools. Unfortunately what was pushed to the community wasn’t friendly for automation purposes, with there being no command line, no way to isolate one of the tests, and no way to get the data out in a sufficient manner.

To that end, we fell back to one rendering a frame from a detailed project. Most reviews, as we have done in the past, focus on one of the classic Blender renders, known as BMW_27. It can take anywhere from a few minutes to almost an hour on a regular system. However now that Blender has moved onto a Long Term Support model (LTS) with the latest 2.83 release, we decided to go for something different.

We use this scene, called PartyTug at 6AM by Ian Hubert, which is the official image of Blender 2.83. It is 44.3 MB in size, and uses some of the more modern compute properties of Blender. As it is more complex than the BMW scene, but uses different aspects of the compute model, time to process is roughly similar to before. We loop the scene for 10 minutes, taking the average time of the completions taken. Blender offers a command-line tool for batch commands, and we redirect the output into a text file.

(4-1) Blender 2.83 Custom Render Test

Blender takes advantage of the more cores, and the higher power limits.

Corona 1.3: Link

Corona is billed as a popular high-performance photorealistic rendering engine for 3ds Max, with development for Cinema 4D support as well. In order to promote the software, the developers produced a downloadable benchmark on the 1.3 version of the software, with a ray-traced scene involving a military vehicle and a lot of foliage. The software does multiple passes, calculating the scene, geometry, preconditioning and rendering, with performance measured in the time to finish the benchmark (the official metric used on their website) or in rays per second (the metric we use to offer a more linear scale).

The standard benchmark provided by Corona is interface driven: the scene is calculated and displayed in front of the user, with the ability to upload the result to their online database. We got in contact with the developers, who provided us with a non-interface version that allowed for command-line entry and retrieval of the results very easily.  We loop around the benchmark five times, waiting 60 seconds between each, and taking an overall average. The time to run this benchmark can be around 10 minutes on a Core i9, up to over an hour on a quad-core 2014 AMD processor or dual-core Pentium.

One small caveat with this benchmark is that it needs online access to run, as the engine will only operate with a license from the licensing servers. For both the GUI and the command-line version, it does this automatically, but it does throw up an error if it can’t get a license. The good thing is that the license is valid for a week, so it doesn’t need further communications until that time runs out.

(4-2) Corona 1.3 Benchmark

Corona is very similar to Blender in the performance scaling.

Crysis CPU-Only Gameplay

One of the most oft used memes in computer gaming is ‘Can It Run Crysis?’. The original 2007 game, built in the Crytek engine by Crytek, was heralded as a computationally complex title for the hardware at the time and several years after, suggesting that a user needed graphics hardware from the future in order to run it. Fast forward over a decade, and the game runs fairly easily on modern GPUs.

But can we also apply the same concept to pure CPU rendering? Can a CPU, on its own, render Crysis? Since 64 core processors entered the market, one can dream. So we built a benchmark to see whether the hardware can.

For this test, we’re running Crysis’ own GPU benchmark, but in CPU render mode. This is a 2000 frame test, with low settings. Initially we planned to run the test over several resolutions, however realistically speaking only 1920x1080 matters at this point.

(4-3) Crysis CPU Render at 1080p Low

This benchmark is always amusing.

CPU Performance: Simulation and Science Xe-LP GPU Performance: Civilization VI
Comments Locked

253 Comments

View All Comments

  • tipoo - Thursday, September 17, 2020 - link

    “Baskin for the exotic”
    I see what you did there...
  • ingwe - Thursday, September 17, 2020 - link

    I didn't get it until I read your comment.
  • Luminar - Thursday, September 17, 2020 - link

    RIP AMD
  • AMDSuperFan - Thursday, September 17, 2020 - link

    "Against the x86 competition, Tiger Lake leaves AMD’s Zen2-based Renoir in the dust when it comes to single-threaded performance." - But I am hoping Big Navi can compete well against this Intel chip.
  • tipoo - Thursday, September 17, 2020 - link

    What does Big Navi have to do with a laptop CPU?
  • AMDSuperFan - Thursday, September 17, 2020 - link

    You care about games don't you? This Intel Tiger won't have an answer for Big Navi. We can look forward to that showing who is the boss.
  • blppt - Thursday, September 17, 2020 - link

    Based on preliminary data, they'll both be about 2 years behind Nvidia, what with Big Navi only matching a 2080ti, and not available for another month at the earliest.
  • hecksagon - Friday, September 18, 2020 - link

    Crazy how you can make that prediction, the only preliminary data that is out is a photograph of the card. Are you a wizard?
  • blppt - Friday, September 18, 2020 - link

    Incorrect.

    https://wccftech.com/amd-radeon-navi-gpu-specs-per...
  • HarryVoyager - Friday, September 18, 2020 - link

    I'm not really seeing where you are getting that from. We know that RDNA2 can hit 2.23Ghz from the PS5 implementation, and we have solid rumors that it the top end one will be an 80CU chip, rather than a 40 CU chip. That implies on the order of a 230% improvement over the 5700XT, if their are no other performance improvements. That alone puts it in the 30-40% improvement range over the 2080 Ti. Given we've already seen at least a few AMD benchmarks of unidentified cards showing a 30-40% improvement over 2080 To performance, that sort of lift does seem likely.

    If I had to guess, that RDNA2 that recently showed up with a near 2080 TI performance is probably a 6700 competitor to the 3070, not the top end card. Those do have to be developed and tested too, after all.

Log in

Don't have an account? Sign up now