Mac studio finally in my hands. Testing surprise

I finally got my hands on a Mac Studio for testing. This isn’t my machine, so it’s not fully spec’d out the way I would have preferred, but I’m still very grateful to have been given the opportunity to test it.

It’s a Mac Studio M2 Ultra with 64GB of RAM and a 60-core GPU, running Flame 2024.2.1.

I’m reaching out to all Flame artists who have access to an M series Mac computer to help us test the performance of Apple’s new M series computers with Flame. Our recent experience with an M2 Ultra chip has yielded some surprising results, and I need your expertise to further understand and analyse these findings.

We have in the past used the benchmark test but I wanted to throw some of our real world comp shots at it.

Here’s a summary of our initial testing:

Impressive Rendering Speed: The render speed on the M2 Ultra chip for basic HDTV-sized shots was exceptionally fast. It was nearly three times faster than our newest Intel Mac and over 10 times faster than our older Linux machines. Please allow me to undulge you with a graph :grin:

HD_compTime

Giddy with excitement with that first result that we decided to throw some 4K at it. Our last project was a UHD vertical project for a door-sized installation. We initially ran this project on our newest Intel Mac, but soon found it to be sluggish at 4K, with better results on our aging Linux systems. This seemed like a good test for Flame 2024 on the M2.

Challenges with Complex Jobs: However, when faced with a more challenging resolution, such as one of our vertical UHD setups, we found the M2 machine was twice as slow.

These results initially blindsided me because of just how much slower the M2 was. However, a closer look at the two setups highlighted a significant difference. Our 4K setup involved lots of CG particles with LS_Airglow, LS_Glint, and some defocus.

In fact, when I simplified my setup and just used ColorNoise with a defocus blur node set to a value of 20, the results were shocking. I tested 200 frames on three of our systems: M2 (Flame 2024.2.1), one of our old Linux boxes (Flame 2023.3), and our Intel Mac (Flame 2023). The render time with the blur node applied to UHD in a vertical format skyrocketed on the M2, reaching 218.66 seconds. :scream:

blurNode

I ran my particle comp on all available machines: the M2 test machine, two aging Linux boxes, and our Intel Mac. I modified the setup from UHD vertical to UHD and finally HDTV to see the render speeds.

disneyParticles

The comp was consistently slower on the M2 this time. I would love your help, and I’m calling upon any Flame artists with access to Apple M series computers to test my blur defocus theory. Keep it simple: Color Noise and the blur node set to defocus 20. It might well be my M2 Ultra configuration.

simpleSetup

I also tested a few other nodes, y_lensblur, and Autodesk denoise. I started with my favorite defocus matchbox first and then denoise because I know it can be quite a heavy process. In both cases, I used 200 frames of Color Noise and then the effect.

matchbox
denoise

In none of these cases did the change in UHD orientation have a massive effect on render time. In fact, in both cases, the M2 was much faster.

Here’s how you can help:

  1. Testing: Got an Apple M series computer? Try out Flame 2024 with blur defocus at different resolutions. Let us know how it goes - share render times, stability, and any surprises you encounter.
  2. Comparative Analysis: If you’re working with Flame on M series Apple hardware and other setups like Intel Mac or Linux, compare the performance. I’d be curious to see how they stack up.

I am trying to get a better idea on Flame’s performance on the Mac studio to help me make the right hardware decision.

Thank you for your support and collaboration.

5 Likes

Very interesting observations.

I can’t add anything to your data points. Just some basic observation - I spend more time at my M1 MacStudio than my Linux Flame or my big Windows Nuke/CG box out of pure office convenience. And while there are things the MacStudio is blazing fast at, there are also things where it’s anything but fast.

I’ve seen others write about it, but I haven’t seen any specific conclusions on where the skeletons are. My guess it has to do with the Apple Silicon approach to GPU processing. While unified memory and some other aspects may favor their approach (e.g. no NVRAM concern), I think they’re still missing some of the secret sauce that NVidia and AMD have built up over the decades, or may simply not be able to get there because of system architecture constraints.

All that to say, I’m not at all surprised at your findings, that there are probably logical explanations for it, which will be good to get our hands on so we can make more informed decisions, but I haven’t seen anything concise that explains it with actual root causes and data.

1 Like
CPU RAM GPUs OS/Model description storage speed Flamebench2015(batchified) mographAction mographBatch OfowAbelMilanes CGintegration02 CGintegration01
mac studio 128GB M2Ultra M2UltraFullSpec 3GB/s 00:07:41 01:30:00 00:00:49 00:05:59 00:01:09 00:00:37
16c Xeon 256GB Vega2Duo Mac Pro 10GB/s 00:11:20 01:00:00 00:01:26 00:30:00 00:02:17 00:58
Ryzen 5950X 128GB RTX3090 Linux/DIYworkstation 3.4GB/s 00:09:00 00:08:20 00:00:42 00:15:42 00:01:19 00:01:21

This sort of matches my observations , i tried different OFLOW setups and stuff to get some data on a fully specced M2 Ultra

its definetely hard to compare as its not just M2 is X times faster as X its like it depends on WHAT you are doing it might be faster than a liunux machine or maybe not

1 Like

Just as an aside, I remember a couple of years ago, my intel macbook would render particles way quicker than my specced up linux box, 40 minutes vs many hours.

1 Like

Hi Richard,

I have a fully spec’d MacPro M2 Ultra, and I did a quick test of the Color Noise/Blur Defocus 20 setup. In landscape format it took 25 seconds to render 200 frames. In portrait format it took 4 minutes and 38 seconds. Pretty surprising. 192 GB RAM Storage 9.2 GB/sec

1 Like

@PlaceYourBetts what are the specs on the aging linux boxes you’re comparing to the M2?

1 Like

I got a M3 Max here, 14-Core CPU / 30-Core GPU / 36GB Unified Memory / 1TB SSD Storage… I can contribute for these tests if you want…

2 Likes

Ok so they are from 2016

Dual xeon (10 core) ES-2640 v4
128GB
8GB GeForce GTX 1080

1 Like

Monday here with us now and I was able to get a bit more time testing this M2.

Ran the Flame benchmark - 05:29 (329.2s)

A very respectable time but i just can’t tear myself away from this Autodesk blur (defocus) anomaly.

I just ran various different resolutions of colour noise through a blur node set to a defocus value of 20 and recorded the time in seconds using renderTime_v2.py (842 Bytes)

I get a very unusual spike in render time around 3840px

Render time of Autodesk defocus on various resolutions on M2

Throwing one more variable into the mix - which version of Flame are you testing? Pre 2024.1 you had a totally different graphics engine, which might affect blur node processing - blurring is usually a GPU intensive algorithm.

That doesn’t explain the bump at 3840, but should be accounted for in the comparison data.

Pure informed speculation on your 3840 anomaly - maybe the way this blur node is coded and adapted to Apple Silicon GPU processing, it hits some API limitation and degrades into a different processing path (e.g. throwing it back to the CPU) that penalizes you.

If that is the case, then conclusion changes - it’s not that M2 is overall worse than Linux, but that there are a few specific nodes / operations that are problematic and may need to be optimized by ADSK. Finding those all will be another matter.

At least in the case of the blur node, there’s lots of alternatives to pick from. Others may not be so lucky.

@PlaceYourBetts It’s worth reporting this to ADSK.

I was just playing around and came across an interesting situation.

If you change the blur nodes on each res to 30 then you get the opposite effect. ie vertical is faster than landscape.

If you change the blur nodes to 50 then you get the same times on each.

Something weird going on with the way blur is handling resolutions at different blur values.

I Also noticed that if you duplicate the blur node and apply it to the other res the blur values change, again affecting render times.

Yeah I have been in touch with support. Very curious :face_with_monocle:

Curious to know what the render times would be for timewarp ML

Uhd is currently about a minute a frame on a maxed out Mac Studio, so…brutal. This will hopefully get better with the updates discussed here. The flame ML stuff runs as well as it does on Linux from my testing.

In my testing with Timewarp ML on M2… HD frames take four or five minutes each. Not good at all but we know the code is meant to run on cuda.

I’ve set up a Gaming PC (3040 gpu) to run RIFE on windows and it’s 1/5th the time to round trip Timewarps out to that PC and back into flame.

1 Like