Where is the bottleneck?

interesting for sure,

I an going to deepdive into this some more next week and figure out if anything in particular creates the massive cliff between linux and mac

1 Like

yea same…

going to do some benchmarks today also on m1max

Ill just take 4 batches from my last job to get a idea of the speed differences.

Depending on what you use to see GPU usage, be mindful that some GPU loads don’t show up the same way. The most reliable GPU resource meter is actually the temp sensor, not the usage stats.

interesting i tried 4 shots and the one i struggled with is by far the slowest on the mac pro, which implies some bottleneck that the other shots dont have, but the sample size is too small for any conclusions, will dive deeper

But here are some numbers ::

Shot01 71 vs 146
Shot02 28 vs 62
Shot03 64 vs 73
Shot04 41 vs 267

seconds, faster is always the linux box (16core ryzen + 3090 + 8x nvme raid + 128gb ram)

slower is the mac pro, 16core xeon, 256gb ram, dual vega pro duo(32gb vram).

definetely interesting results that tell me i need go further down this rabbithole and create a full benchmark suite thats better than these random 4 shots from my last job.

Sure, but those findings seem pretty consistent with what the benchmarks show already… and with what most here experience anecdotally. There is a marked improvement with the Apple Silicon equipped Macs that narrows the gap, but the performance inequalities between the platforms are real and measurable.

I wonder, once there is a metal native version of the gap will narrow further…

3 Likes

yes but i expected a more consitent gap tbh

Not necessarily, the platform performance will impact some features more than others. And in a way more helpful. What’s in the script of shot 4 that’s not in the others, or what is different on shot 4 in terms of content that makes it harder to scale an algorithm?

If the gap was more even then it would be harder to compare and isolate.

Finn if you wanna really get into it you could try using the macOS profiling tools which come with Xcode - run Instruments.app, pick Time Profiler, choose Running Applications/Flame at the top and hit the red record button while you’re rendering… hit stop after 20 seconds or so then from the Call Tree menu at the bottom enable just Invert Call Tree and you’ll see a list of which functions were using the CPU the most.

I think I remember it showing useful stuff for Flame - things like a slow network share or most of the time being spent debayering R3Ds were obvious, and I definitely picked up the problem with weirdly sized IBL maps being crazy slow on Mac like this. Can feel a bit like deciphering entrails though… looks like this for Houdini, I can see my VDB surfacing is slowest by far:

10 Likes

love it lewis thank you!

1 Like

I agree and I don’t. The “orange” dot in the nodes is great but must be used wisely. At the end is a cache, and having several “on” takes a lot of space from the ram/I don’t know where, and it gets slower.

It always depends on the resolution and amount of frames but in our world, or at least mine, of working with camera resolutions, 16 bits EXR with plenty of 3d, and with the number of shots we use to take…well…sometimes gets veeery slow.

Cleaning the cache and rebooting gets back to normal, but there should be something, a solution, more specific regarding this issue.