Flame can't use more than 64 threads

Hello Flame friends,

I recently received a Lenovo P620 with beast mode 3995WX CPU. This processor has 64cores/128 threads. I’ve run some CPU-intensive benchmarks (basically Denoise Median filter with high values). While looking at CPU usage metrics, it is clear that Flame is only utilizing about 50% or 64 threads total, leaving the rest idle. In addition, Randy ran the same exact test on his 3975WX based machine which has 32cores/64threads, and the result time was basically the same, just a few seconds shorter since the cores are slightly faster.

The end conclusion is that 64 threads are the end of the line for Flame. So if you are spec’ing a new machine, you can save many thousands of dollars not buying the 3995WX as half the threads are un-used. In fact, I’ve disabled them in the BIOS which lets the native cores run about 10% faster.

Now you know. Still waiting for feedback from ADSK.

7 Likes

Very interesting. The 3995 is approx $2,500 more than the 3975. Looks like in 2022 we still care about Single thread ratings.

1 Like

Great info guys thanks. Literally on the Lenovo configurator when I saw this.

2 Likes

Ships tomorrow. :slight_smile: Lenovo ThinkStation P620 - tower - Ryzen ThreadRipper PRO 3975WX 3.5 GHz - - 30E00097US - -

4 Likes

And for those curious disabling multithreading in the Bios on a 3975 slows down renders by a lot…like, doubles the time on a cpu only task like Denoise/Median. So, looks like 3975 + multhreading enabled is the sweet spot.

2 Likes

That price sucks…they are super juicing.

1 Like

Yep, 3975 + SMT is the sweet spot as that gets you the 64 threads.

2 Likes

Totes. At least you don’t have to wait 1-4 months.

2 Likes

Chris… what is your current machine? I would be interested in getting some more metrics to validate the performance advantage of newer machines.

Not cnoellert here, but thank you Alan for sharing your benchmark details… this is the kind of insight that I wish we would get from ADSK.

3 Likes

something something NUMA nodes?

Maybe the 32core has a single numa and the 64 has 2? would be odd though as flame runs on dual cpu setups so what do i know but there seems to be a generall issue with loads of apps wirh 64c/128t cpus and using them all.

I run a ryzen 5950x with only 16c but singlethread is so badassss, tbh looking at system useage flame almost always seems limited by singlecore for most things (maybe not the denoise node but most seems bery single-core-y) only limit is just having 128G of ram

I thought that too, and ran all the benchmarks forcing 2 or 4 NUMA nodes via BIOS.

1 Like

Hi Folks, Just wanted to comment here - On the technical side - there’s not a simple answer… The Flame app itself is capable of using more than 32 cores / 64 threads - but we suspect some of our 3rd party toolkits - codecs, plugins etc might be limiting / pausing activity on the extra cores on this machine. We chose the 3975 CPU spec that we tested and certified based on price / performance overall, but don’t have experience with the 3995. It sounds like supply chain issues might have been a factor here. Best regards, Will

2 Likes

Hi Will,

I’m using a very simple test, which is representative of part of our workload and is purely CPU bound.

Color Noise Frame (UHD 16bit AcesCG) → Denoise/Median 5/5 → Render Node 100 frames

On 3995 this is 10% faster by turning off SMT.

With SMT half the threads are basically idle, and the other half have widely varying utilization. With SMT off, which makes the machine only have a true core, they are more uniformly fully utilized.

no SMT - 4:11
SMT - 4:38

2 Likes

Sorry to revive an old thread…

As I’m about to purchase a 59**WX, I thought It prudent to do some benchmarks on my system to quantify the upgrade and in the process I was reminded of this thread and that I may have some information to contribute?

The short of it is…

In doing a similar test to Alan’s, I saw no benefit to render times going past 16 cores (32 threads)
I have tested this by provisioning a VM with varying amounts of cores on my 3975wx.

There is fairly linear scaling in render times when rendering with 2 → 16 cores. However, when I provision any more cores I gain no performance (infact I lose ~2%).

While monitoring the system in htop, you can see all the cores do light up regardless of core count. However, with 16 cores enabled, Flame manages to peg all of the cores consistently at 90%+ usage. Any more than that and they fluctuate between ~20% - 90%

I’m not sure what to make of it, is it NUMA related? the way CPU operations are scheduled by Flame? Is this specifically a threadripper problem? if anyone has a spare multi-socketed xeon system laying around…

There may be some tomfoolery caused by running Flame in a VM (i.e I haven’t bothered manually pinning cores, or messing with hugepages etc) but I do think there is some underlying issue here. Anecdotally, even on bare metal installs, I have felt that high core counts have scaled poorly.

Anecdotally based on the non scientific experience I’ve had with a few different configs of Mac Pros and Threadripper Pros I’d go with a 5965wx if I do in fact upgrade.