Where is the bottleneck?

So ive got this pretty big batch script and it EXTREMELY slow, as in 1 frame every 10s slow on my mac pro.

while rendering i look at all my stats, CPU, GPU, ram… storage,… they are all ideling around, not a single core is peaked so it does not look like some single thread limitation. (its like 20% useage on all cores)
How can that be, how do you find the bottleneck? are there any logs or stats for a batch script like nukes profiler node? or is there a way to simply render multiple frames at once like nukes frameserver (or deadline)? for nuke its usually single core beign maxxed out so rendering 16 frames at once is a almost 16x speed improvement, maybe I am just overlooking something but its like ridicolously slow.

There are a bunch of heavy cryptomatte nodes and whatnot but still, it should at least max out some part of my system when rendering, no?

1 Like

I have been pre-rendering my cryptomattes @finnjaeger lately

But also if I am trying to find the bottle neck I find pre-rendering can be a helpful way of finding that stuff out. You don’t even need to use write or render you can just hit that orange dot that everyone loves so much and cache a few nodes to simulate a pre-render and then see how things improve ¯\_(ツ)_/¯

2 Likes

Yea cryptos are weirdely slow, but they should max out some piece of hardware still, or am i thinking wrong?

I understand that if a downstream node is waiting for a upstream node and that node is really slow thats what it is, bur seeign my machine just basiclaly idle while everything is slow is just kinda infuriating.

So lets say you look at just cryptomatte node, it doesnt use many ressources but is still slow - how can that be? there needs to be a bottleneck somewhere that I am not able to see, something like memory bandwidth? GPU memory speed? i have no idea. :thinking:

2 Likes

What about the access and unpacking of the 32-bit file from off the server.
Do you cache the cryptomatte import?

Source is a 16bit exr thats cached on import, so it should live on my framestore as PIZ compressed, shouldnt be that hard. Even if the decompress would be hard I would see my cpu spike?

The thing is its not just this comp in particular i feel like Flame is really fast until it hits a steep drop in performance and I can usually never tell why, i feel like a LOT of performance is left on the table.

Need to dive deeper into this, it kinda sounds to me like something just isnt optimzied and is waiting for stuff unnecessarry or so :thinking:

What about GPU ram? There are situations, mostly with MotionVector Tracking, that GPU ram leaks and then the card crashes and flame turns into dogshit. In Terminal type dmesg and look for nvidia stuff. You can also monitor via Resource Monitor in flame.

1 Like

interesting this would be a mac in this case, checked vram useage and it looked all good gpu was doing mostly nothing

If it is a memory leak I don’t think there is an easy way of monitoring/detecting that looking at the system analytics. More than happy to be corrected and enlightened on this though.

2 Likes

i expected a memory leak to show up as excessive memory useage but i might be wrong, need to check some other comps and see what vauses the dropoff in performance with almost no hardware usesge

Did you check vram in macOS or Resource manager in Flame?

1 Like

yea its a long done project just revisiting to figure our why it was so damn slow.

using “stats” tool in macOS. but I can check the ressource manager.

currently updating my linux flame to see if that same project runs 100* better maybe…

1 Like

Its extremely fast on my linux flame … so … its just the mac beign a mac who could have guessed .

16c mac pro with dual vega2 and a nvme raid vs diy linux flame witha 16c ryzen cpu and a 3090…

1 Like

It’s interesting how the Mac Studio that is half the price of the Mac Pro is faster for Flame but a gaming PC that is half the price again is faster still.

1 Like

When a batch becomes unusually slow, my suspicion is always vram. Regardless of what the resource manager shows. Sometimes, this happens after a long time with flame opened and taking resources. I’m sure you’ve already done it, but by restarting flame, works for me 70-80% of the time to get back to the original render time. Also I check it out to have the minimum possible batches, libraries, etc. opened.

I have a custom machine with an “old” 1080 ti (here known by “game cards”), a bit short on vram, but it works fine restarting flame with this kind of issues. On another sever I use a 2080 ti, working really well and solid.

But I would bet that with so many cryptomate nodes, it simply full vram. Flame is so vram demanding.

Graphics capabilities on mac are a disaster. The amd cards simply sucks. It’s also disappointing to see how rendering full batchs , cpu consumption never exceeds 30%-40%. That’s why I switched to linux when we renewed the machine. Mac plattform is a sad joke.

2 Likes

Yea even though the mac has 32Gb vram and the 3090 only has 24G :smiley:

I think its just how the mac is… everything just feels 10x as slow as on the linux thing.

Its not a perfect benchmark and there was only 2 live cryptomatte nodes left but its

5fps playback on the linux machine and 3-4s per frame on the mac so… its quiet crazy both having nvme raids all is cached etc.

Flame only uses 1 GPU, so if you are counting 32GB as 2x16GB Vega, well, no.

Vega 2 Duo 64Gb total , 32Gb each

2 Likes

Yea I mean thats to find what slows down the script,i understand that, and I know which are heavy nodes,

the thing is that baffles me is that a “heavy node” would still use system ressources, this mac is idleing doing almost nothing while flame is slow as molasses.

also same script on linux uses way more ressources like 80% CPU on all 32 cores, gpu is doing things… , So I assume there is something on the mac system where its a harware bottleneck that I cant see, something like moving frames from the system ram to the gpu, stuff you cant see in any type of task manager.

something that makes the CPU and GPU wait for data…

I wonder if it could have something to do with the graphics API? You’d think it would either just work or not but the Mac implementation of open gl is depreciated and some capabilities are limited.

That’s just a wild stab in the dark but I’m wondering if there is a node in your setup where this is the case so slowing it right down without seeing anything in the metrics.