Faster than local?

Look on the bright side. We’re all professional progress bar watchers. And these short render breaks are fantastic to send out a networking email or ping someone on Slack the keep the job pipeline in good health

1 Like

I don’t remember Cineon having it… but Media Illusion did.

But it was over 20 years ago now so I could be conflating.

Just back to iStat for a second…

If you click on the small charts you get a flyout with more detailed history which you can change in terms of time frame. Here’s 10min. So you can do a render and then revisit what happened.

In this case I rendered a batch openclip with some moderate action complexity and then a full timeline with some color corrections and all those openclips. Timeline render took about 1min. This is all on M1 Studio Max, so not the latest.

You can see the spikes in disk activity, and when you hover you can read out the values. At the peak of the timeline render it transferred data at 48MB/s. Far below the limits. This is with an external TB3 NVMe Glyph drive, single stick.

Screenshot 2024-09-27 at 7.34.43 AM

CPU performance at the same time reached 32%, more than half of which was ‘system’ meaning handling the disk i/o.

Color coding a bit ambiguous here, where they use red for ‘system’ and ‘e-cores’ and blue for ‘user’ and ‘p-cores’. These may not be 100% correlated.

Screenshot 2024-09-27 at 7.34.56 AM

Relatively simple tool, gives you plenty of insight where your limits are.

So why did this timeline take 1min to render? It wasn’t disk I/O bound. And it wasn’t CPU bound either. Most likely Flame’s render engine is not multi-threaded enough to truly max out the CPU and GPU in this case. Single core clock speed probably was the limiting factor. Also contributing are EXR sequences. They’re slow to decode and I believe mostly CPU decoded. So your disk is faster in terms of bytes than the code can process the bytes coming in.

Unmanaged storage is great for managing your Flame projects. It relying almost exclusively on EXR sequences is not ideal for render times. Longterm it would be great to to have a ProRes option for OpenClips. Not simple though to go from a image sequence to a stream.

Better hardware will make a minor difference. It would probably take a massive rewrite of the Flame code to make a big difference. But that’s a story for another day.

One question for the Flame devs would be - On this Mac Studio you see the efficiency cores being maxed out most of the time. Does the Flame code manage the core differences and move most of the heavy lifting to the performance cores? I don’t know enough about this.

This is a relatively new thing in CPU design (last 2-3 generations), that not all cores are equal. The efficiency cores were a big breakthrough helping with battery performance and energy friendly metrics. But it complicated load scheduling in the OS, and perforamnce critical apps need take this into account or be left behind.

A bit more on that topic here. There are various discussions online when M1 came out and since, about Quicktime for example running only on E cores. Apparently the end user has limited control with the taskpolicy command, but cannot force individual tasks to P cores. This can only done by the code using the thread level QoS value, which the kernel scheduler will take into account.

I’m assuming the Flame devs are aware of all this, and have optimized the Flame code accordingly. But would be interesting to hear from @fredwarren about this topic?

PS: The same holds true from some Intel CPUs and thus Flame on Linux.

The absolute fastest

Iodyne storage pro thing as it bascially uses 2 Thunderbolt controlelrs in parallel. downside: expensive

if you want cheaper options dont look at thunderbolt as thunderbolt is very limited for storage speed use USB4 instead, there are a handful of REAL usb4.0 controllers ? that outpace thundebolt devices, example OWC 1M2 .

2 Likes